1 Introduction and statement of the main results

In the seminal works [46, 47] Schrödinger addressed the problem of finding the most likely evolution of a cloud of independent Brownian particles conditionally on the observation of their initial and final configuration. In modern language this is an entropy minimization problem with marginal constraints. The aim of this work is to take the first steps in the understanding of the Mean Field Schrödinger Problem, obtained by replacing in the above description the independent particles by interacting ones.

To obtain an informal description of the problem, consider N Brownian particles \((X^{i,N}_t)_{t\in [0,T],1\le i \le N}\) interacting through a pair potential W

$$\begin{aligned} \left\{ \begin{array}{rl} \mathrm {d}X_t^{i,N}=&{}-\frac{1}{N} \sum \limits _{k=1}^N\nabla W(X_t^{i,N}-X_t^{k,N})\mathrm {d}t+\mathrm {d}B^i_t \\ X_0^{i,N}\sim &{} \mu ^{\mathrm {in}}. \end{array} \right. \end{aligned}$$
(1)

Their evolution is encoded in the random empirical path measure

$$\begin{aligned} \frac{1}{N}\sum \limits _{i=1}^N\delta _{X^{i,N}_{\cdot }}. \end{aligned}$$
(2)

At a given time T, the configuration of the particle system is visible to an external observer that finds it close to an “unexpected” (écart spontané et considérable in [47]) probability measure \(\mu ^{\mathrm {fin}}\), namely

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^N\delta _{X^{i,N}_{T}} \approx \mu ^{\mathrm {fin}}\end{aligned}$$
(3)

It is a classical result [4, 20, 49] that the sequence of empirical path measures (2) obeys the large deviations principle (LDP). Thus, the problem of finding the most likely evolution conditionally on the observations is recast as the problem of minimizing the LDP rate function among all path measures whose marginal at time 0 is \(\mu ^{\mathrm {in}}\) and whose marginal at time T is \(\mu ^{\mathrm {fin}}\). This is the mean field Schrödinger problem (MFSP). Extending naturally the classical terminology we say that an optimal path measure is a mean field Schrödinger bridge (henceforth MFSB) and the optimal value is the mean field entropic cost. The latter generalizes both the Wasserstein distance and the entropic cost.

The classical Schrödinger problem has been the object of recent intense research activity (see [36]). This is due to the computational advantages deriving from introducing an entropic penalization in the Monge-Kantrovich problem [19] or to its relations with functional inequalities, entropy estimates and the geometrical aspects of optimal transport. Our article contributes to this second line of research, recently explored by the papers [17, 28, 32, 34, 43, 44]. Leaving all precise statements to the main body of the introduction, let us give a concise summary of our contributions.

Dynamics of mean field Schrödinger bridges Our mean field version of the Schrödinger problem stems from fundamental results in large deviations for weakly interacting particle systems such as [20, 49] and shares some analogies with the control problems considered in [16] and with the article [2] in which an entropic formulation of second order variational mean field games is studied. Among the more fundamental results we establish for the mean field Schrödinger problem, we highlight

  • the existence of MFSBs and, starting from the original large deviations formulation, the derivation of both an equivalent reformulation in terms of a McKean–Vlasov control problem as well as a Benamou-Brenier formula,

  • establishing that MFSBs solve forward backward stochastic differential equations (FBSDE) of McKean–Vlasov type (cf. [10, 11]).

The proof strategy we adopt in this article combines ideas coming from large deviations and stochastic calculus of variations, see [18, 23, 52]. Another interesting consequence of having a large deviations viewpoint is that we can also exhibit some regularity properties of MFSBs, taking advantage of Föllmer’s results [25] on time reversal. Building on [17, 28] we establish a link between FBSDEs and the Riemannian calculus on probability measures introduced by Otto [41] that is of independent interest and underlies our proof strategies. In a nutshell, the seminal article [31] established that the heat equation is the gradient flow of the relative entropy w.r.t. the squared Wasserstein distance. Thus, classical first order SDEs yield probabilistic representations for first order ODEs in the Riemannian manifold of optimal transport. Our observation may be seen as the second order counterpart to the results of [31]: indeed we will present an heuristic strongly supporting the fact that Markov solutions of “second order” trajectorial equations (FBSDEs) yield probabilistic representations for second order ODEs in the Riemannian manifold of optimal transport.

Ergodicity of Schrödinger bridges and functional inequalities Consider again (1) and assume that W is convex so that the particle system is rapidly mixing and there is a well defined notion of equilibrium configuration \(\mu _{\infty }\). If N and T are large, one expects that

  1. (i)

    The configurations \(\frac{1}{N}\sum _{i=1}^N\delta _{X^i_t}\) at times \(t=0,T/2,T\) are almost independent.

  2. (ii)

    The configuration at T/2 is with high probability very similar to \(\mu _{\infty }\).

Because of (i), even when the external observer acquires the information (3), he/she still expects (ii) to hold. Thus mean field Schrödinger bridges are to spend most of their time around the equilibrium configuration. All our quantitative results originate in an attempt to justify rigorously this claim.

In this work we obtain a number of precise quantitative energy dissipation estimates. These lead us to the main quantitative results of the article:

  • we characterize the long time behavior of MFSBs, proving exponential convergence to equilibrium with sharp exponential rates,

  • we derive a novel class of functional inequalities involving the mean field entropic cost. Precisely, we obtain a Talagrand inequality and an HWI inequalityFootnote 1 that generalize those previously obtained in [12] by Carrillo, McCann and Villani.

Regarding the second point above, we can in fact retrieve (formally) the inequalities in [12] by looking at asymptotic regimes for the mean field Schrödinger problem. Besides the intrinsic interest and their usefulness in establishing some of our main results, our functional inequalities may have consequences in terms of concentration of measure and hypercontractivity of non linear semigroups, but this is left to future work.

The fact that optimal curves of a given optimal control problem spend most of their time around an equilibrium is known in the literature as the turnpike property. The first turnpike theorems have been established in the 60’s for problems arising in econometry [39]; general results for deterministic finite dimensional problems are by now available, see [50]. In view of the McKean–Vlasov formulation of the mean field Schrödinger problem, some of our results may be viewed as turnpike theorems as well, but for a class of infinite dimensional and stochastic problems. An interesting feature is that, by exploiting the specific structure of our setting, we are able to establish the turnpike property in a quantitative, rather than qualitative form. The McKean–Vlasov formulation also connects our findings with the study of the long time behavior of mean field games [5, 7,8,9].

Concerning the proof methods, our starting point is Otto calculus and the recent rigorous results of [17] together with the heuristics put forward in [28]. The first new ingredient of our proof strategy is the above mentioned connection between FBSDEs and Otto calculus that plays a key role in turning the heuristics into rigorous statements. It is worth remarking that using a trajectorial approach does not just provide with a way of making some heuristics rigorous, but it also permits to obtain a stronger form of some of the results conjectured in [28] which then simply follow by averaging trajectorial estimates. The second new ingredient in our proofs involves a conserved quantity that plays an analogous role to the total energy of a physical system. For such quantity we derive a further functional inequality which seems to be novel already in the classical Schrdinger problem (i.e. for independent particles) and allows to establish the turnpike property.

Structure of the article In the remainder introductory section we state and comment our main results. In Sect. 2 we provide a geometrical interpretation sketching some interesting heuristic connections between optimal transport and stochastic calculus. The material of this section is not used later on; therefore the reader who is not interested in optimal transport may avoid it. Sections 3 and 4 contain the proofs of our main results, the former being devoted to the results concerning the dynamics of MFSBs and the latter one dealing with the ergodic results. Finally an appendix section contains some technical results.

1.1 Frequently used notation

  • \((\Omega ,{\mathcal {F}}_t,{\mathcal {F}}_T)\) is the canonical space of \({\mathbb {R}}^d\)-valued continuous paths on [0, T], so \(\{{\mathcal {F}}_{t}\}_{t\le T}\) is the coordinate filtration. \(\Omega \) is endowed with the uniform topology.

  • \({\mathcal {P}}(\Omega )\) and \({\mathcal {P}}({\mathbb {R}}^d)\) denote the set of Borel probability measures on \(\Omega \) and \({\mathbb {R}}^d\) respectively.

  • \((X_t)_{t\in [0,T]}\) is the canonical (i.e. identity) process on \(\Omega \).

  • \(\mathrm {R}^\mu \) is the Wiener measure with starting distribution \(\mu \).

  • \({\mathcal {H}}(\mathrm {P}|\mathrm {Q})\) denotes the relative entropy of \(\mathrm {P}\) with respect to \(\mathrm {Q}\), defined as \({\mathbb {E}}_\mathrm {P}\left[ \log \left( \frac{\mathrm {d}\mathrm {P}}{\mathrm {d}\mathrm {Q}} \right) \right] \) if \(\mathrm {P}\ll \mathrm {Q}\) and \(+\infty \) otherwise.

  • \(\mathrm {P}_t\) denotes the marginal distribution of a measure \(\mathrm {P}\in {\mathcal {P}}(\Omega )\) at time t.

  • \({\mathcal {P}}_{\beta }(\Omega )\) is the set of measures on \(\Omega \) for which \(\sup _{t\le T}|X_t|^\beta \) is integrable. \({\mathcal {P}}_{\beta }({\mathbb {R}}^d)\) is the set of measures on \({\mathbb {R}}^d\) for which the function \(|\cdot |^\beta \) is integrable.

  • The \(\beta \)-Wasserstein distance on \({\mathcal {P}}_{\beta }(\Omega )\) is defined by

    $$\begin{aligned} {\mathcal {P}}_{\beta }(\Omega )^2\ni (\mathrm {P},\mathrm {Q})\mapsto {\mathcal {W}}_\beta (\mathrm {P},\mathrm {Q}):=\left( \inf _{Y\sim \mathrm {P},Z\sim \mathrm {Q}}{\mathbb {E}}\left[ \sup _{t\in [0,T]}|Y_t-Z_t|^\beta \right] \right) ^{1/\beta }. \end{aligned}$$

    With a slight abuse of notation we also denote by \({\mathcal {W}}_\beta \) the \(\beta \)-Wasserstein distance on \({\mathcal {P}}_{\beta }({\mathbb {R}}^d)\) defined analogously.

  • For a given measurable marginal flow \([0,T]\ni t\mapsto \mu _t \in {\mathcal {P}}({\mathbb {R}}^d)\), we denote by \(L^2((\mu _t)_{t\in [0,T]})\) the space of square integrable functions from \([0,T]\times {\mathbb {R}}^d\) to \({\mathbb {R}}^d\) associated to the reference measure \(\mu _t(\mathrm {d}x)\mathrm {d}t\) and the corresponding almost-sure identification. We consider likewise the Hilbert space

    $$\begin{aligned} \mathrm {H}_{-1}((\mu _t)_{t\in [0,T]}), \end{aligned}$$

    defined as the closure in \(L^2((\mu _t)_{t\in [0,T]})\) of the smooth subspace

    $$\begin{aligned} \left\{ \Psi :[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\ {\text {s.t.}} \ \Psi = \nabla \psi ,\, \psi \in {\mathcal {C}}^{\infty }_c([0,T] \times {\mathbb {R}}^d) \right\} . \end{aligned}$$
  • \(\gamma \) and \(\lambda \) are respectively the standard Gaussian and Lebesgue measure in \({\mathbb {R}}^d\).

  • \({\mathcal {C}}^{l,m}([0,T]\times {\mathbb {R}}^d;{\mathbb {R}}^k)\) is the set of functions from \([0,T]\times {\mathbb {R}}^d\) to \( {\mathbb {R}}^k\) which have l continuous derivatives in the first (ie. time) variable and m continuous derivatives in the second (ie. space) variable. The space \({\mathcal {C}}^{m}({\mathbb {R}}^d;{\mathbb {R}}^k)\) is defined in the same way. \({\mathcal {C}}^{\infty }_c([0,T]\times {\mathbb {R}}^d)\) is the space of real-valued smooth functions on \([0,T]\times {\mathbb {R}}^d\) with compact support. The gradient \(\nabla \) and Laplacian \(\Delta \) act only in the space variable.

  • If f is a function and \(\mu \) a measure, its convolution is \(x\mapsto f*\mu (x):=\int f(x-y)\mu (\mathrm {d}y)\).

1.2 The mean field Schrödinger problem and its equivalent formulations

We are given a so-called interaction potential \(W:{\mathbb {R}}^d\rightarrow {\mathbb {R}},\) for which we assume

$$\begin{aligned} W \text { is of class } {\mathcal {C}}^{2}({\mathbb {R}}^d;{\mathbb {R}}) \text { and symmetric, i.e. } W(\cdot )=W(-\cdot ), \\ \sup _{z,v \in {\mathbb {R}}^d, |v|=1} v\cdot \nabla ^2W(z) \cdot v < +\infty . \end{aligned}$$
(H1)

Besides the interaction potential, the data of the problem are a pair of probability measures \(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}\) on which we impose

$$\begin{aligned} \mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}\in {\mathcal {P}}_2({\mathbb {R}}^d)\text { and } \tilde{{\mathcal {F}}}(\mu ^{\mathrm {in}}),\tilde{{\mathcal {F}}}(\mu ^{\mathrm {fin}})<+\infty , \end{aligned}$$
(H2)

where the free energy or entropy functional \(\tilde{{\mathcal {F}}}\) is defined for \(\mu \in {\mathcal {P}}_2({\mathbb {R}}^d)\) by

$$\begin{aligned} \tilde{{\mathcal {F}}}(\mu ) = {\left\{ \begin{array}{ll} \int _{{\mathbb {R}}^d} \log \mu (x) \mu (\mathrm {d}x)+ \int _{{\mathbb {R}}^d} W*\mu (x) \mu (\mathrm {d}x), \quad &{} \text{ if } \mu \ll \lambda \\ +\infty , \quad &{} {\text{ otherwise. }} \end{array}\right. } \end{aligned}$$
(4)

In the above, and in the rest of the article, we shall make no distinction between a measure and its density against Lebesgue measure \(\lambda \), provided it exists.

We recall that the McKean–Vlasov dynamics is the non linear SDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm {d}Y_t = - \nabla W*\mu _t (Y_t)\mathrm {d}t + \mathrm {d}B_t, \\ Y_0\sim \mu ^{\mathrm {in}}, \quad \mu _t = \mathrm {Law}(Y_t), \quad \forall t\in [0,T]. \end{array}\right. } \end{aligned}$$
(5)

Under the hypothesis (H1), it is a classical result (see e.g. [13, Thm 2.6]) that (5) admits a unique strong solution whose law we denote \(\mathrm {P}^{\text {{MKV}}}\). The functional \(\tilde{{\mathcal {F}}}\) plays a crucial role in the sequel. For the moment, let us just remark that the marginal flow of the McKean–Vlasov dynamics may be viewed as the gradient flow of \(\frac{1}{2}\tilde{{\mathcal {F}}}\) in the Wasserstein space \(({\mathcal {P}}_2({\mathbb {R}}^d),{\mathcal {W}}_2(\cdot ,\cdot ))\).

If \(\mathrm {P}\in {\mathcal {P}}_{1}(\Omega ) \) is given, then the stochastic differential equation

$$\begin{aligned} \left\{ \begin{array}{rrl} \mathrm {d}Z_t&{}=&{} -\nabla W *\mathrm {P}_t (Z_t)\mathrm {d}t+\mathrm {d}B_t, \\ Z_0&{}\sim &{} \mu ^{\mathrm {in}}, \end{array} \right. \end{aligned}$$

admits a unique strong solution (cf. Sect. 3.2) whose law we denote \(\Gamma (\mathrm {P})\). With this we can now introduce the main object of study of the article:

Definition 1.1

The mean field Schrödinger problemFootnote 2 is

$$\begin{aligned} \inf \left\{ {\mathcal {H}}(\mathrm {P}| \Gamma (\mathrm {P}) ) \,:\,\, \mathrm {P}\in {\mathcal {P}}_{1}(\Omega ), \,\mathrm {P}_0=\mu ^{\mathrm {in}},\, \mathrm {P}_T=\mu ^{\mathrm {fin}}\right\} . \end{aligned}$$
(MFSP)

Its optimal value, denoted \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\), is called mean field entropic transportation cost. Its optimizers are called mean field Schrödinger bridges (MFSB).

It is not difficult to provide existence of optimizers for (MFSP). In the classical case, uniqueness is an easy consequence of the convexity of the entropy functional. However, the rate function \({\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))\) is not convex in general.

Proposition 1.1

Grant (H1), (H2). Then (MFSP) admits at least an optimal solution.

Remark 1.1

The dynamics of the McKean–Vlasov dynamics for the particle system (1) displays a wide array of different behaviors, including phase transitions, see [51] for example. Thus, we do not expect uniqueness of mean field Schödinger bridges in general. However, in the case when W is convex, although the rate function \({\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))\) is not convex in the usual sense, the entropy \({\mathcal {F}}\) is displacement convex in the sense of McCann [38]. This observation was indeed used to prove uniqueness of minimizers for \({\mathcal {F}}\), and could be the starting point towards uniqueness for (MFSP).

1.2.1 Large deviations principle (LDP)

We start by deriving the LDP interpretation of (MFSP). Recall the interacting particle system \((X^{i,N}_t)_{t\in [0,T],1\le i \le N}\) of (1). The theory of stochastic differential equations guarantees the strong existence and uniqueness for this particle system under (H1), (H2). In the next theorem we obtain a LDP for the sequence of empirical path measures; in view of the classical results of [20], it is not surprising that the LDP holds. However, even the most recent works on large deviations for weakly interacting particle systems such as [4] do not seem to cover the setting and scope of Theorem 1.1. Essentially, this is because in those references the LDPs are obtained for a topology that is weaker than the \({\mathcal {W}}_1\)-topology, that is what we need later on.

Theorem 1.1

In addition to (H1), (H2) assume that

$$\begin{aligned} \int _{{\mathbb {R}}^d}\exp (r|x|)\mu ^{\mathrm {in}}(\mathrm {d}x)<\infty \text { for all }r>0. \end{aligned}$$
(6)

Then the sequence of empirical measures

$$\begin{aligned} \left\{ \frac{1}{N}\sum _{i=1}^N\delta _{X^{i,N}} ; N\in {\mathbb {N}} \right\} , \end{aligned}$$

satisfies the LDP on \({\mathcal {P}}_{1}(\Omega )\) equipped with the \({\mathcal {W}}_1\)-topology, with good rate function given by

$$\begin{aligned} {\mathcal {P}}_{1}(\Omega )\ni \mathrm {P}\mapsto {\mathscr {I}}(\mathrm {P}):=\left\{ \begin{array}{ll} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P})),&{}\quad \mathrm {P}\ll \Gamma (\mathrm {P}), \\ +\infty ,&{}\quad {\text {otherwise}}. \end{array} \right. \end{aligned}$$
(7)

In fact we will prove in Sect. 3 a strengthened version of Theorem 1.1 where the drift term is much more general. For this, we will follow Tanaka’s elegant reasoning [49].

Remark 1.2

Having a rate function implies \(\mathrm {Prob} [ \frac{1}{N} \sum _{i=1}^N \delta _{X^{i,N}_{\cdot }} \approx \mathrm {P}] \approx \exp (-N {\mathscr {I}}(\mathrm {P}))\) heuristically. Hence Problem (MFSP) has the desired interpretation of finding the most likely evolution of the particle system conditionally on the observations (when N is very large).

1.2.2 McKean–Vlasov control and Benamou-Brenier formulation

We now reinterpret the mean field Schrödinger problem (MFSP) in terms of McKean–Vlasov stochastic control (also known as mean field control).

Lemma 1.1

Let \(\mathrm {P}\) be admissible for (MFSP). There exists a predictable process \((\alpha ^{\mathrm {P}}_t)_{t\in [0,T]}\) s.t.

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |\alpha ^{\mathrm {P}}_t|^2 \mathrm {d}t \right] <+\infty \end{aligned}$$
(8)

and so that

$$\begin{aligned} X_t - \int _{0}^t \left( -\nabla W *\mathrm {P}_s (X_s) + \alpha ^{\mathrm {P}}_s\right) \,\mathrm {d}s \end{aligned}$$
(9)

has law \(R^{\mu ^{\mathrm {in}}}\) under \(\mathrm {P}\). The problem (MFSP) is equivalent to

$$\begin{aligned} \inf \left\{ \frac{1}{2}{\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |\alpha ^{\mathrm {P}}_t|^2 \mathrm {d}t \right] \,:\,\, \mathrm {P}\in {\mathcal {P}}_{1}(\Omega ), \,\mathrm {P}_0=\mu ^{\mathrm {in}},\, \mathrm {P}_T=\mu ^{\mathrm {fin}},\, \alpha ^{\mathrm {P}}\text { as in } (9) \right\} ,\nonumber \\ \end{aligned}$$
(10)

as well as to

$$\begin{aligned} \begin{aligned} \inf&\,\,\frac{1}{2}{\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |\Phi _t+\nabla W *\mathrm {P}_t (X_t) |^2 \mathrm {d}t \right] \\ {\text {s.t.}}&\,\, \mathrm {P}\in {\mathcal {P}}_{1}(\Omega ), \,\mathrm {P}_0=\mu ^{\mathrm {in}},\, \mathrm {P}_T=\mu ^{\mathrm {fin}},\, \mathrm {P}\circ \left( X_\cdot -\int _0^\cdot \Phi _s\mathrm {d}s \right) ^{-1}=R^{\mu ^{\mathrm {in}}}. \end{aligned} \end{aligned}$$
(11)

The formulations (10)–(11) can be seen as McKean–Vlasov stochastic control problems. In the first case one is steering through \(\alpha ^{\mathrm {P}}\) part of the drift of a McKean–Vlasov SDE. In the second case one is controlling the drift \(\Phi \) of a standard SDE but the optimization cost depends non-linearly on the law of the controlled process. In both cases, the condition \(\mathrm {P}_T=\mu ^{\mathrm {fin}}\) is rather unconventional. By analogy with the theory of mean field games, one could refer to (10)–(11) as planning McKean–Vlasov stochastic control problems, owing to this type of terminal condition.

The third and last formulation of (MFSP) we propose relates to the well known fluid dynamics representation of the Monge Kantorovich distance due to Benamou and Brenier (cf. [53]) that has been recently extended to the standard entropic transportation cost [15, 27]. The interest of this formula is twofold: on the one hand it clearly shows that (MFSP) is equivalent and gives a rigorous meaning to some of the generalized Schrödinger problems formally introduced in [28, 34]. On the other hand, it allows to interpret (MFSP) as a control problem in the Riemannian manifold of optimal transport. This viewpoint, that we shall explore in more detail in Sect. 2, provides with a strong guideline towards the study of the long time behavior of Schrödinger bridges.

We define the set \({\mathcal {A}}\) as the collection of all absolutely continuous curves \((\mu _t)_{t\in [0,T]}\subset {\mathcal {P}}_2({\mathbb {R}}^d)\) (cf. Sect. 4.2) such that \(\mu _0=\mu ^{\mathrm {in}},\mu _T=\mu ^{\mathrm {fin}}\) and

$$\begin{aligned} (t,z)\mapsto \nabla \log \mu _t(z)&\in L^2(\mathrm {d}\mu _t\mathrm {d}t),\\ (t,z)\mapsto \nabla W *\mu _t(z)&\in L^2(\mathrm {d}\mu _t\mathrm {d}t). \end{aligned}$$

We then define

$$\begin{aligned} {\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}):= & {} \inf _{\begin{array}{c} (\mu _t)_{t\in [0,T]}\in {\mathcal {A}},\\ \partial _t \mu _t + \nabla \cdot (w_t \mu _t )=0 \end{array}} \, \frac{1}{2}\int _0^T\int _{{\mathbb {R}}^d} |w_t(z) \nonumber \\&+\, \frac{1}{2}\nabla \log \mu _t(z) + \nabla W *\mu _t(z)s|^2\mu _t(\mathrm {d}z) \mathrm {d}t. \end{aligned}$$
(12)

Theorem 1.2

Let (H1), (H2) hold. Then

$$\begin{aligned} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) = {\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}). \end{aligned}$$

If \(\mathrm {P}\) is optimal for (MFSP) and the latter is finite, then \((\mathrm {P}_t)_{t\in [0,T]}\) is optimal in (12) and its associated tangent vector field w is given by

$$\begin{aligned} - \nabla W*\mathrm {P}_t(z)+ \Psi _t(z) -\frac{1}{2} \nabla \log \mathrm {P}_t, \end{aligned}$$

where \(\Psi \) is as in Theorem 1.3 below.

Conversely, if \((\mu _t)_{t\in [0,T]}\) is optimal for \({\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\) and the latter is finite, then there exists an optimizer of \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) \) whose marginal flow equals \((\mu _t)_{t\in [0,T]}\).

1.3 Mean field Schrödinger bridges

Leveraging the stochastic control interpretation, and building on the stochastic calculus of variations perspective, we obtain the following necessary optimality conditions for (MFSP).

Theorem 1.3

Assume (H1), (H2) and let \(\mathrm {P}\) be optimal for (MFSP). Then there exist \(\Psi \in \mathrm {H}_{-1}((\mathrm {P}_t)_{t\in [0,T]})\) such that

$$\begin{aligned} (\mathrm {d}t\times \mathrm {d}\mathrm {P}\text {-a.s.}) \quad \alpha ^{\mathrm {P}}_t= \Psi _t(X_t), \end{aligned}$$
(13)

where \((\alpha ^{\mathrm {P}}_t)_{t\in [0,T]}\) is related to \(\mathrm {P}\) as in Lemma 1.1. The process \(t\mapsto \Psi _t(X_t)\) is continuousFootnote 3 and the process \((M_t)_{t\in [0,T]}\) defined by

$$\begin{aligned} M_t:=\Psi _t(X_t) - \int _{0}^t \tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}} \left[ \nabla ^2 W(X_s-\tilde{X}_s)\cdot (\Psi _s(X_s) - \Psi _s(\tilde{X}_s) )\right] \, \mathrm {d}s \end{aligned}$$
(14)

is a continuous martingale under \(\mathrm {P}\) on [0, T[, where \((\tilde{X}_t)_{t\in [0,T]}\) is an independent copy of \((X_t)_{t\in [0,T]}\) defined on some probability space \((\tilde{\Omega },\tilde{{\mathfrak {F}}},\tilde{\mathrm {P}})\) and \(\tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}}\) denotes the expectation on \((\tilde{\Omega },\tilde{{\mathfrak {F}}},\tilde{\mathrm {P})}\).

We shall refer to \(\Psi \) as the corrector of \(\mathrm {P}\). Correctors will play an important role in the ergodic results. In this part, we give an interpretation of Theorem 1.1 in terms of stochastic analysis (FBSDEs) and partial differential equations.

1.3.1 Planning McKean–Vlasov FBSDE for MFSB

We consider the following McKean Vlasov forward-backward stochastic differential equation (FBSDE) in the unknowns (XYZ):

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm {d}X_t= -\tilde{{\mathbb {E}}}[\nabla W (X_t-\tilde{X}_t) ]\mathrm {d}t + Y_t\mathrm {d}t+ \mathrm {d}B_t\\ \mathrm {d}Y_t = \tilde{{\mathbb {E}}}\big [\nabla ^2 W(X_t-\tilde{X}_t) \cdot (Y_t-\tilde{Y}_t)\big ]\mathrm {d}t + Z_t\cdot \mathrm {d}B_t\\ X_0\sim \mu ^{\mathrm {in}},\, X_T\sim \mu ^{\mathrm {fin}}. \end{array}\right. } \end{aligned}$$
(15)

As in the stochastic control interpretation of the mean field Schrödinger problem, here too the terminal condition \(X_T\sim \mu ^{\mathrm {fin}}\) is somewhat unconventional. We hence call this forward-backward system the planning McKean–Vlasov FBSDE.

Thanks to the results in Sect. 1.2.2 we can actually solve (15). If \(\mathrm {P}\) is optimal for (MFSP) with associated \(\Psi \) as recalled in Theorem 1.3 above, all we need to do is take \(Y_t:=\Psi _t(X_t)\) and reinterpret (9) for the dynamics of the canonical process X and (14) for the dynamics of Y (in the latter case using martingale representation).

One remarkable aspect of this connection between Schrödinger problems and FBSDEs is that one can prove existence of solutions to such FBSDEs by a purely variational method. Indeed, we remark that (15) is beyond the scope of existing FBSDE theory, such as Carmona and Delarue’s [11, Theorem 5.1]. Further, we also obtained for free an extra bit of information: the constructed process Y lives in \(\mathrm {H}_{-1}((\mathrm {P}_t)_{t\in [0,T]})\). This is in tandem with the usual heuristic relating FBSDEs and PDEs (where Y is conjectured to be an actual gradient) as explained in Carmona and Delarue’s [10, Remark 3.1]. In fact, if we make the additional assumption that \(Y_t=\nabla \psi _t(X_t)\) for some potential \(\psi _t(x)\), and we set \(\mu _t=(X_{t})_{\#}\mathrm {P}\), then after some computations we arrive at the PDE systemFootnote 4:

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \mu _t(x) - \frac{1}{2}\Delta \mu _t(x)+ \nabla \cdot ((-\nabla W *\mu _t(x)+\nabla \psi _t(x))\mu _t(x))=0\\ \partial _t \nabla \psi _t(x)+ \frac{1}{2}\nabla \Delta \psi _t(x)+\nabla ^2 \psi _t(x) \cdot \big ( -\nabla W *\mu _t(x) + \nabla \psi _t(x)\big )\\ = \int _{{\mathbb {R}}^d} \nabla ^2W(x-\tilde{x})\cdot (\nabla \psi _t(x)-\nabla \psi _t(\tilde{x})) \mu _t(\mathrm {d}\tilde{x}),\\ \mu _0(x) = \mu ^{\mathrm {in}}(x), \mu _T(x) = \mu ^{\mathrm {fin}}(x). \end{array}\right. } \end{aligned}$$
(16)

1.3.2 Schrödinger potentials and the mean field planning PDE system

The PDE system (16) is the literal translation of the planning McKean–Vlasov FBSDE in the case when the process Y is an actual gradient, \(Y=\nabla \psi \). In the next corollary we show that if this is the case, and if \(\psi \) is sufficiently regular, then (16) can be rewritten as a system of two coupled PDEs, the first being a Hamilton–Jacobi–Bellman equation for \(\psi \), and the second one being a Fokker-Planck equation. This type of PDE system is the prototype of a planning mean field game [33].

Corollary 1.1

Let \(\mathrm {P}\) be an optimizer for (MFSP), \(\Psi _{\cdot }(\cdot )\) be as in Theorem 1.3 and set \(\mu _t=\mathrm {P}_t\) for all \(t\in [0,T]\). If \(\mu _{\cdot }(\cdot )\) is everywhere positive and of class \({\mathcal {C}}^{1,2}([0,T]\times {\mathbb {R}}^d;{\mathbb {R}})\) and \(\Psi _{\cdot }(\cdot )\) is of class \({\mathcal {C}}^{1,2}([0,T]\times {\mathbb {R}}^d;{\mathbb {R}}^d)\) then there exists \(\psi :[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}\) such that \(\Psi _t(x) =\nabla \psi _t(x)\) for all \((t,x)\in [0,T]\times {\mathbb {R}}^d\). Moreover, \((\psi _{\cdot }(\cdot ),\mu _{\cdot }(\cdot ))\) form a classical solution of

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \psi _t(x) + \frac{1}{2}\Delta \psi _t(x) + \frac{1}{2}|\nabla \psi _t(x)|^2 = \int _{{\mathbb {R}}^d} \nabla W(x-\tilde{x}) \cdot ( \nabla \psi _t(x)-\nabla \psi _t(\tilde{x}))\mu _t(\mathrm {d}\tilde{x}),\\ \partial _t \mu _t(x) - \frac{1}{2} \Delta \mu _t(x) + \nabla \cdot ((-\nabla W *\mu _t(x) + \nabla \psi _t(x))\mu _t(x) )=0,\\ \mu _0(x) = \mu ^{\mathrm {in}}(x), \mu _T(x) = \mu ^{\mathrm {fin}}(x) \end{array}\right. }\nonumber \\ \end{aligned}$$
(17)

A fundamental result [26, 57] concerning the structure of optimizers in the classical Schrödinger problem is that their density takes a product form, i.e.

$$\begin{aligned} \mu _t = \exp (\psi _t+\varphi _t), \end{aligned}$$

where \(\varphi _t(x)\), \(\psi _t(x)\) solve respectively the forward and backward Hamilton Jacobi Bellman equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\,\,\,\partial _t \psi + \frac{1}{2}\Delta \psi + \frac{1}{2}|\nabla \psi |^2=0,\\ -\partial _t \varphi + \frac{1}{2}\Delta \varphi + \frac{1}{2}|\nabla \varphi |^2 =0.\\ \end{array}\right. } \end{aligned}$$
(18)

It is interesting to see that this structure is preserved in (MFSP), at least formally. The effect of having considered interacting Brownian particles instead of independent ones is reflected in the fact that the two Hamilton–Jacobi–Bellman PDEs are coupled not only through the boundary conditions but also through their dynamics.

Corollary 1.2

Using the same notation and under the same hypotheses of Corollary 1.1, if we define \(\varphi :[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}\) via

$$\begin{aligned} \mu _t = \exp (-2 W*\mu _t + \varphi _t +\psi _t )\end{aligned}$$

then \((\psi _{\cdot }(\cdot ),\varphi _{\cdot }(\cdot ))\) solves

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\,\,\,\partial _t \psi _t(x) + \frac{1}{2}\Delta \psi _t(x) + \frac{1}{2}|\nabla \psi _t(x)|^2 = \int \nabla W(x-\tilde{x}) \cdot ( \nabla \psi _t(x)-\nabla \psi _t(\tilde{x}))\mu _t(\mathrm {d}\tilde{x}),\\ -\partial _t \varphi _t(x) + \frac{1}{2}\Delta \varphi _t(x) + \frac{1}{2}|\nabla \varphi _t(x)|^2 = \int \nabla W(x-\tilde{x}) \cdot ( \nabla \varphi _t(x)-\nabla \varphi _t(\tilde{x}))\mu _t(\mathrm {d}\tilde{x}).\\ \end{array}\right. } \end{aligned}$$

1.4 Convergence to equilibrium and functional inequalities

Our aim is to show that MFSBs spend most of their time in a small neighborhood of the equilibrium configuration \(\mu _{\infty }\), to study their long time behavior, and to derive a new class of functional inequalities involving the mean field entropic cost \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\).

Throughout this section we make the assumption that W is uniformly convex, ie. that

$$\begin{aligned} \exists \kappa >0 \quad s.t. \quad \forall z\in {\mathbb {R}}^d, \quad \nabla ^2 W(z) \ge \kappa {\mathbb {I}}_{d\times d}, \end{aligned}$$
(H3)

where the inequality above has to be understood as an inequality between quadratic forms. Under (H3) the McKean Vlasov dynamics associated with the particle system (1) converges in the limit as \(T\rightarrow +\infty \) to an equilibrium measure \(\mu _{\infty }\), that is found by minimizing the functional \(\tilde{{\mathcal {F}}}\) over the elements of \({\mathcal {P}}_2({\mathbb {R}}^d)\) whose mean is the same as \(\mu ^{\mathrm {in}}\). Existence and uniqueness of \(\mu _{\infty }\) has been proven in [38].

We shall often assume that \(\mu ^{\mathrm {in}}\) and \(\mu ^{\mathrm {fin}}\) have the same mean:

$$\begin{aligned} \int _{{\mathbb {R}}^d} x\, \mu ^{\mathrm {in}}(\mathrm {d}x) = \int _{{\mathbb {R}}^d} x\, \mu ^{\mathrm {fin}}(\mathrm {d}x). \end{aligned}$$
(H4)

Remark 1.3

Assumption (H3) is a classical one ensuring exponential convergence rates for the McKean–Vlasov dynamics. It may be weakened in various ways, see the work [12] by Carrillo, McCann and Villani or the more recent [3] by Bolley, Gentil and Guillin, for instance. It is an interesting question to determine which of the results of this section still hold in the more general setup. Hypothesis (H4) can be easily removed using the fact that the mean evolves linearly along any Schrödinger bridge (see Lemma 4.2 below). We insist that the only key assumption is (H3).

Long time behavior of mean field games The articles [5, 7,8,9] study the asymptotic behaviour of dynamic mean field games showing convergence towards an ergodic mean field game with exponential rates. Following [33], we can associate to (17) an ergodic PDE system with unknowns \((\lambda ,\psi ,\mu )\). Such PDE system expresses optimality conditions for the ergodic control problem corresponding to (10). It is easy to see that \((0,0,\mu _{\infty })\) is a solution of that ergodic system. Therefore, we are addressing the same questions studied in the above mentioned articles. However, the equations we are looking at are quite different. A fundamental difference is that the coupling terms in (10) are not monotone in the sense of [6, Eq.(7) p. 8].

1.4.1 Exponential convergence to equilibrium and the turnpike property

A key step towards the forthcoming quantitative estimates is to consider the time-reversed version of our mean field Schrödinger problem. For \(\mathrm {Q}\in {\mathcal {P}}(\Omega )\) the time reversal \({\hat{\mathrm {Q}}}\) is the law of the time reversed process \((X_{T-t})_{t\in [0,T]}\). In Lemma 4.5 we prove that if \(\mathrm {P}\) is an optimizer for (MFSP), then \({\hat{\mathrm {P}}}\) optimizes

$$\begin{aligned} \inf \left\{ {\mathcal {H}}(\mathrm {Q}| \Gamma (\mathrm {Q}) ) \,:\, \mathrm {Q}\in {\mathcal {P}}_{1}(\Omega ),\, \mathrm {Q}_0=\mu ^{\mathrm {fin}},\, \mathrm {Q}_T=\mu ^{\mathrm {in}} \right\} . \end{aligned}$$
(19)

The optimality of \({\hat{\mathrm {P}}}\) implies the existence of an associated process \({{\hat{\Psi }}}\) as described in Theorem 1.3. We show at Theorem 1.6 below that the function

$$\begin{aligned}{}[0,T]\ni t\mapsto {\mathbb {E}}_{\mathrm {P}}[\Psi _t(X_t)\cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t})] \end{aligned}$$
(20)

is a constant, that we denote \({\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\) and call the conserved quantity. Naturally this quantity depends also on T but we omit this from the notation.

Theorem 1.4 confirms the intuition that mean field Schrödinger bridges are localized around \(\mu _{\infty }\) providing an explicit upper bound for \({\mathcal {F}}(\mathrm {P}_{t})\) along any MFSB, where

$$\begin{aligned} {\mathcal {F}}(\mu ) = \tilde{{\mathcal {F}}}(\mu )-\tilde{{\mathcal {F}}}(\mu _{\infty }). \end{aligned}$$
(21)

We recall that \(\mu _{\infty }\) is found by minimizing \(\tilde{{\mathcal {F}}}\) among all elements of \({\mathcal {P}}_2({\mathbb {R}}^d)\) whose mean is the same as \(\mu \). If \(\tilde{{\mathcal {F}}}\) is thought of as a free energy, then \({\mathcal {F}}\) should be thought of as a divergence (from equilibrium). A graphical illustration of Theorem 1.4 and the turnpike property is provided in the appendix.

Theorem 1.4

Assume (H1)–(H4) and let \(\mathrm {P}\) be an optimizer for (MFSP). For all \(t\in [0,T]\) we have

$$\begin{aligned} {\mathcal {F}}(\mathrm {P}_t)\le & {} \frac{\sinh (2\kappa (T-t))}{\sinh (2\kappa T)}\Big ({\mathcal {F}}(\mu ^{\mathrm {in}})-\frac{{\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})}{2\kappa } \Big )\nonumber \\&\quad +\frac{\sinh (2\kappa t)}{\sinh (2\kappa T)}\Big ( {\mathcal {F}}(\mu ^{\mathrm {fin}})-\frac{{\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})}{2\kappa }\Big ) + \frac{{\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})}{2\kappa }. \end{aligned}$$
(22)

Moreover, for all fixed \(\theta \in (0,1)\) there exists a decreasing function \(B(\cdot )\) such that

$$\begin{aligned} {\mathcal {F}}(\mathrm {P}_{\theta T})\le B(\kappa )({\mathcal {F}}(\mu ^{\mathrm {in}}) +{\mathcal {F}}(\mu ^{\mathrm {fin}})) \exp (-2 \kappa \min \{\theta , 1-\theta \} T) \end{aligned}$$
(23)

uniformly in \(T \ge 1\).

In particular, since \({\mathcal {F}}(\mathrm {P}_{\theta T})\) dominates \({\mathcal {W}}_2(\mathrm {P}_{\theta T},\mu _{\infty })\) (see e.g. [12, (ii), Thm 2.2 1]), we obtain that \(\mathrm {P}_{\theta T}\) converges exponentially to \(\mu _{\infty }\) with exponential rate proportional to \(\kappa \). The proof of (22) is done by bounding the second derivative of the function \(t\mapsto {\mathcal {F}}(\mathrm {P}_t)\) along Schrödinger bridges with the help of the logarithmic Sobolev inequality established in [12]. To obtain (23) from (22) we use a functional inequality for the conserved quantity and a Talagrand inequality for \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\), that are the content of Theorem 1.6 and Corollary 1.3 below. It is worth mentioning that the estimates (22), (23) (as well as (32) below) appear to be new even for the classical Schrödinger bridge problem and have not been anticipated by the heuristic articles [28, 34]. Conversely, the above mentioned estimates admit a geometrical interpretation in the framework of Otto calculus that allows to formally extend their validity to the whole class of problems studied in [28].

Remark 1.4

The exponential rate in (23) has a sharp dependence on \(\kappa \). To see this, fix \(\mu ^{\mathrm {in}}\) and choose \(\mu ^{\mathrm {fin}}=\mathrm {P}^{\text {{MKV}}}_T\). Then it is easy to see that the restriction of \(\mathrm {P}^{\text {\tiny {MKV}}}\) to the interval [0, T] is an optimizer for (MFSP). Setting \(\theta =1/2\) and considering (23) for \(T=2t\) we arrive at

$$\begin{aligned} \forall t\ge 1/2, \quad {\mathcal {F}}(\mathrm {P}^{\text {{MKV}}}_t) \le B(\kappa ) \exp (-2\kappa t) \end{aligned}$$

Thus, we obtain the same exponential rate as in [12]Footnote 5, that is easily seen to be optimal under the assumption that W is \(\kappa \)-convex. A similar argument can be used to show the optimal dependence of the rate in \(\theta \).

In the previous theorem we showed that, when looking at a timescale that is proportional to T, the marginal distribution of any Schrödinger bridge is exponentially close to \(\mu _{\infty }\). Here we show that for a fixed value of t, we have an exponential convergence towards the law of the McKean–Vlasov dynamics \(\mathrm {P}^{\text {\tiny {MKV}}}\), see (5).

Theorem 1.5

Assume (H1)–(H4) and let \(\mathrm {P}\) be an optimizer for (MFSP). For all \(t\in [0,T]\) we have

$$\begin{aligned}&{\mathcal {W}}_2^2(\mathrm {P}_t,\mathrm {P}^{\text {\tiny {MKV}}}_t) \nonumber \\&\quad \le 2t \left( \, \frac{{\mathcal {F}}(\mu ^{\mathrm {in}})}{\exp (2\kappa T)-1} + \frac{\exp (2\kappa T)-\exp (2\kappa (T-t))}{ \exp (2\kappa (T-t))-1} \frac{{\mathcal {F}}(\mu ^{\mathrm {fin}})}{\exp (2\kappa T)-1} \right) \end{aligned}$$
(24)

In particular, the above theorem tells that \({\mathcal {W}}_2^2(\mathrm {P}_t,\mathrm {P}^{\text {\tiny {MKV}}}_t)\) decays asymptotically at least as fast as \(\exp (-2\kappa T)\) when T is large.

1.4.2 Functional inequalities for the mean field entropic cost

It is well known that analysing the evolution of entropy-like functionals along the so-called displacement interpolation of optimal transport has far reaching consequences in terms functional inequalities [55]. Since (MFSP) provides with an alternative way of interpolating between probability measures, it is tempting to see if it leads to new functional inequalities involving the cost \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\). Here, we present a Talagrand and an HWI inequality that we used in order to study the long time behavior of MFSBs. They generalize their respective counterparts in [42, 48]. Both inequalities are based on another upper bound for the evolution of \({\mathcal {F}}\) along MFSBs, whose presentation we postpone to Theorem 4.1.

The following Talagrand inequality tells that the mean field entropic cost grows at most linearly with \({\mathcal {F}}\):

Corollary 1.3

(A Talagrand inequality) Assume (H1)–(H4). Then for all \(T>0\) we have

$$\begin{aligned} \forall t\in (0,T),\quad {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\le & {} \frac{1}{\exp (2\kappa t)-1} {\mathcal {F}}(\mu ^{\mathrm {in}}) \nonumber \\&+ \frac{\exp (2\kappa (T-t))}{\exp (2\kappa (T-t))-1}{\mathcal {F}}(\mu ^{\mathrm {fin}}). \end{aligned}$$
(25)

In particular, choosing \(\mu ^{\mathrm {fin}}=\mu _{\infty }\) leads to

$$\begin{aligned} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu _{\infty }) \le \frac{1}{\exp (2\kappa T)-1} {\mathcal {F}}(\mu ^{\mathrm {in}}). \end{aligned}$$
(26)

Unlike the classical case, in the entropic HWI inequality the Wasserstein distance is replaced by the conserved quantity \({\mathscr {E}}_{\mathrm {P}}\) in the first term on the rhs and by the mean field entropic cost in the second term. An extra positive contribution \(\frac{1}{4}{\mathcal {I}}_{{\mathcal {F}}}\) is present in the first term. Our interpretation is that this compensates for the fact that in the “gain” term we put the cost \({\mathscr {C}}_T\), that is larger than the squared Wasserstein distance. In order to state the HWI inequality, we introduce the non linear Fisher information functional \({\mathcal {I}}_{{\mathcal {F}}}\) defined for \(\mu \in {\mathcal {P}}_2({\mathbb {R}}^d)\) by

$$\begin{aligned} {\mathcal {I}}_{{\mathcal {F}}}(\mu ) = {\left\{ \begin{array}{ll} \int _{{\mathbb {R}}^d} \Big |\nabla \log \mu + 2\nabla W *\mu (x)\Big |^2 \mu (\mathrm {d}x), \quad &{} \text{ if } \nabla \log \mu \in L^2_{\mu } \\ +\infty \quad &{} {\text{ otherwise. }} \end{array}\right. } \end{aligned}$$
(27)

where by \(\nabla \log \mu \in L^2_{\mu }\) we mean \(\mu \ll \lambda \) and that \(\log \mu \) is an absolutely continuous function on \({\mathbb {R}}^d\) whose derivative is in \(L^2_{\mu }\). The non linear Fisher information can be seen to be equal to the derivative of the free energy \(\tilde{{\mathcal {F}}}\) along the marginal flow of the McKean Vlasov dynamics.

Corollary 1.4

(An HWI inequality) Assume (H1)–(H3) and choose \(\mu ^{\mathrm {fin}}=\mu _{\infty }\). If \(\mathrm {P}\) is an optimizer for (MFSP) and \(t\mapsto {\mathcal {I}}_{{\mathcal {F}}}(\mathrm {P}_t)\) is continuousFootnote 6 in a right neighbourhood of 0, then

$$\begin{aligned} {\mathcal {F}}(\mu ^{\mathrm {in}})\le & {} \frac{1-\exp (-2\kappa T)}{2\kappa } \left( {\mathcal {I}}_{{\mathcal {F}}}(\mu ^{\mathrm {in}}) \left( \frac{1}{4}{\mathcal {I}}_{{\mathcal {F}}}(\mu ^{\mathrm {in}}) - {\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu _{\infty }) \right) \right) ^{1/2} \nonumber \\&- (1-\exp (-2\kappa T)) {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu _{\infty }). \end{aligned}$$
(28)

It is worth noticing that by letting \(T\rightarrow +\infty \) in the above HWI inequality we obtain the logarithmic Sobolev inequality [12, Thm 2.2]. Indeed, \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu _{\infty })\) is always non negative and we shall see at Theorem 1.6 below that \({\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu _{\infty }) \rightarrow 0\). The short time regime is also interesting. Indeed, if \(W=0\), \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu _{\infty })\) is the standard entropic cost and we have under suitable hypothesis on \(\mu ^{\mathrm {in}}\) (see [40])

$$\begin{aligned} \lim _{T\rightarrow 0} T {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu _{\infty })= \frac{1}{2} {\mathcal {W}}_2^2(\mu ^{\mathrm {in}},\mu _{\infty }). \end{aligned}$$
(29)

The heuristic arguments put forward in [28] tell that (29) is expected to be true even when W is a general potential satisfying (H1). Following again (29), one also expects that

$$\begin{aligned} \lim _{T \rightarrow 0} \frac{T^2}{4}{\mathcal {I}}_{{\mathcal {F}}}(\mu ^{\mathrm {in}}) - T^2 {\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu _{\infty }) = {\mathcal {W}}^2_2(\mu ^{\mathrm {in}},\mu _{\infty }). \end{aligned}$$
(30)

Putting (29) and (30) together we obtain an heuristic justification of the fact that in the limit as \(T\rightarrow 0\) (28) becomes the classical HWI inequality put forward in [12], namely

$$\begin{aligned} {\mathcal {F}}(\mu ^{\mathrm {in}}) \le {\mathcal {W}}_2(\mu ^{\mathrm {in}},\mu _{\infty }) {\mathcal {I}}_{{\mathcal {F}}}(\mu ^{\mathrm {in}})^{1/2} - \kappa {\mathcal {W}}^2_2(\mu ^{\mathrm {in}},\mu _{\infty }). \end{aligned}$$

Our last result is a functional inequality that establishes a hierarchical relation between the conserved quantity and the mean field entropic cost: the former is exponentially small in T and \(\kappa \) in comparison with the latter. We may refer to this as an energy-transport inequality since the conserved quantity may be geometrically interpreted as the total energy of a physical system (cf. [17, Corollary 1.1]).

Theorem 1.6

Assume (H1)–(H4) and let \(\mathrm {P}\) be an optimizer. Then the function

$$\begin{aligned}{}[0,T]\ni t\mapsto {\mathbb {E}}_{\mathrm {P}}[\Psi _t(X_t)\cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t})] \end{aligned}$$
(31)

is constant. Denoting this constant by \({\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\), we have

$$\begin{aligned} |{\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})| \le \frac{4\kappa }{\exp (\kappa T)-1} \left( {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}){\mathscr {C}}_T(\mu ^{\mathrm {fin}},\mu ^{\mathrm {in}}) \right) ^{1/2}. \end{aligned}$$
(32)

In general the term \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}){\mathscr {C}}_T(\mu ^{\mathrm {fin}},\mu ^{\mathrm {in}})\) in (32) cannot be simplified further, since typically \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\ne {\mathscr {C}}_T(\mu ^{\mathrm {fin}},\mu ^{\mathrm {in}})\). E.g. \({\mathscr {C}}_T(\delta _0,\nu )=0\) if \(\nu \) is the law of the unconstrained McKean–Vlasov SDE at time T started at zero, whereas \({\mathscr {C}}_T(\nu ,\delta _0)>0\), as it takes effort to drive such SDE to zero.

2 Connections with optimal transport

In this section we shall see how the results of this article relate to the Riemannian calculus on \({\mathcal {P}}_2({\mathbb {R}}^d)\) introduced by Otto [41], at least formally. The reader not interested in optimal transport per se is encouraged to skip this section in a first reading. The link is rooted in a seemingly novel connection between (McKean–Vlasov) FBSDEs and second order ODEs in the Riemannian manifold of optimal transport that we find of independent interest. To better understand this connection, let us begin by recalling that in the seminal article [31] it is proven that the marginal flow of the trajectorial SDE

$$\begin{aligned} \mathrm {d}X_t = - \nabla U(X_t)\mathrm {d}t +\mathrm {d}B_t \end{aligned}$$
(33)

can be interpreted as the gradient flow of the entropy functional

$$\begin{aligned} \mu \mapsto \frac{1}{2}\int _{{\mathbb {R}}^d} \log \mu (x) \mu (\mathrm {d}x) +\int _{{\mathbb {R}}^d} U(x)\mu (\mathrm {d}x) \end{aligned}$$

w.r.t. the 2-Wasserstein metric. Thus, first order Itô SDEs provide with probabilistic representations for first order ODEs in the Riemannian manifold of optimal transport. Of course, since a path measure is not fully determined by its one time marginals, the SDE (33) contains more information than the gradient flow equation. It has been shown in [17] that the marginal flow of a classical Schrödinger bridge satisfies a second order ODE, more precisely a Newton’s law in which the acceleration field is the Wasserstein gradient of the Fisher information functional. The natural question is then: What trajectorial (second order) SDE governs the dynamics of a Schrödinger bridge and yields a probabilistic representation for the associated Newton’s law? In order to answer this, let us first recall some notions of Otto calculus.

2.1 Second order calculus on \({\mathcal {P}}_2({\mathbb {R}}^d)\)

In the next lines, we sketch the ideas behind the Riemannian calculus on \({\mathcal {P}}_2({\mathbb {R}}^d)\). It would be impossible to provide a self-contained introduction in this work and we refer to [53] or [29] for detailed accounts. The main idea is to equip \({\mathcal {P}}_2({\mathbb {R}}^d)\) with a Riemannian metric such that the associated geodesic distance is \({\mathcal {W}}_2(\cdot ,\cdot )\). To do this, one begins by identifying the tangent space \({\mathcal {T}}_{\mu }{\mathcal {P}}_2\) at \(\mu \in {\mathcal {P}}_2({\mathbb {R}}^d)\) as the space closure in \(L^2_{\mu }\) of the subspace of gradient vector fields

$$\begin{aligned} {\mathcal {T}}_{\mu }{\mathcal {P}}_2= \overline{\{ \nabla \varphi , \varphi \in {\mathcal {C}}^{\infty }_c({\mathbb {R}}^d)\}}^{L^2_{\mu }}. \end{aligned}$$

The velocity (first derivative) of a sufficiently regular curve \([0,T]\ni t\mapsto \mu _t\in {\mathcal {P}}_2({\mathbb {R}}^d)\) is then defined by looking at the only solution \(v_t(x)\) of the continuity equation

$$\begin{aligned} \partial _t \mu _t + \nabla \cdot (v_t \mu _t)=0 \end{aligned}$$

such that \(v_t\in {\mathcal {T}}_{\mu _t}{\mathcal {P}}_2\) for all \(t\in [0,T]\). Finally, the Riemannian metric (Otto metric) \(\langle \cdot ,\cdot \rangle _{{\mathcal {T}}_{\mu }{\mathcal {P}}_2}\) is defined by

$$\begin{aligned} \langle \nabla \varphi ,\nabla \psi \rangle _{{\mathcal {T}}_{\mu }{\mathcal {P}}_2} = \int _{{\mathbb {R}}^d}\nabla \varphi \cdot \nabla \psi (x) \, \mu (\mathrm {d}x). \end{aligned}$$
(34)

It can be seen that the constant speed geodesic curves associated to the Riemannian metric we have introduced coincide with the displacement interpolations of optimal transport and that the corresponding geodesic distance is indeed \({\mathcal {W}}_2(\cdot ,\cdot )\). This makes it possible to carry out several explicit calculations. In particular, we can compute the gradient \(\mathrm {grad}^{{\mathcal {W}}_2}{\mathcal {F}}\) and the Hessian \(\mathrm {Hess}^{{\mathcal {W}}_2}{\mathcal {F}}\) of a smooth functional \({\mathcal {F}}:{\mathcal {P}}_2({\mathbb {R}}^d)\rightarrow {\mathbb {R}}\). At least formally, we have

$$\begin{aligned} \langle \mathrm {grad}^{{\mathcal {W}}_2}{\mathcal {F}},\nabla \varphi \rangle _{{\mathcal {T}}_{\mu }{\mathcal {P}}_2}&= \frac{\mathrm {d}}{\mathrm {d}h} {\mathcal {F}}((id+h \nabla \varphi )_{\#}\mu ) \Big |_{h=0}\\ \langle \nabla \varphi , \mathrm {Hess}^{{\mathcal {W}}_2}_{\mu } {\mathcal {F}}(\nabla \varphi )\rangle _{{\mathcal {T}}_{\mu }{\mathcal {P}}_2}&= \frac{\mathrm {d}^2}{\mathrm {d}h^2} {\mathcal {F}}((id+h \nabla \varphi )_{\#}\mu ) \Big |_{h=0}, \end{aligned}$$

where we used the notation \(\#\) for the push forward. In particular, setting \(W=0\) for simplicity in (27) we obtain that the classical Fisher information functional \({\mathcal {I}}\) has a gradient that can be computed with the rules above. One obtains that (cf. [54])

$$\begin{aligned} \mathrm {grad}^{{\mathcal {W}}_2}{\mathcal {I}}(\mu ) = -2\nabla \Delta \log \mu - \nabla |\nabla \log \mu |^2. \end{aligned}$$

The Levi-Civita connection associated to the Riemannian metric (34) can also be explicitly computed with the help of the orthogonal projection operator \(\Pi _{\mu }:L^2_{\mu } \rightarrow {\mathcal {T}}_{\mu }{\mathcal {P}}_2\). To do this, consider a regular curve \((\mu _t)_{t\in [0,T]}\) with velocity \((v_t)_{t\in [0,T]}\) and a tangent vector field \(t\mapsto u_t\in {\mathcal {T}}_{\mu _t}{\mathcal {P}}_2\) along \((\mu _t)_{t\in [0,T]}\). It turns out that if one defines the covariant derivative \(\frac{\mathbf {D}}{\mathrm {d}t}u_t\) of \((u_t)_{t\in [0,T]}\) along \((\mu _t)_{t\in [0,T]}\) as the vector field

$$\begin{aligned} \frac{\mathbf {D}}{\mathrm {d}t}u_t = \Pi _{\mu _t}\left( \partial _t u_t + \mathrm {D} u_t \cdot v_t \right) \end{aligned}$$

then this covariant derivative satisfies the compatibility with the metric and the torsion-free identity, i.e. it is the Levi-Civita connection. The acceleration of the curve \((\mu _t)_{t\in [0,T]}\) is then the covariant derivative of the velocity along the curve, i.e.

$$\begin{aligned} \frac{\mathbf {D}}{\mathrm {d}t}v_t = \partial _t v_t + \frac{1}{2}\nabla |v_t|^2. \end{aligned}$$
(35)

2.2 Newton’s laws and FBSDEs

According to the above discussion the Newton’s law in \(({\mathcal {P}}_2({\mathbb {R}}^d), \langle .,.\rangle _{{\mathcal {T}}_{\cdot }{\mathcal {P}}_2})\)

$$\begin{aligned} {\left\{ \begin{array}{ll}\frac{\mathbf {D}}{\mathrm {d}t} v_t = \frac{1}{8} \mathrm {grad}^{{\mathcal {W}}_2}{\mathcal {I}}(\mu _t)\\ \mu _{0}=\mu ^{\mathrm {in}}, \mu _T=\mu ^{\mathrm {fin}}\end{array}\right. } \end{aligned}$$
(36)

provides with a geometrical interpretation for the PDE system (see [17] for more details)

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \mu _t(x) + \nabla \cdot (\nabla \phi _t(x) \mu _t(x))=0 \\ \partial _t \nabla \phi _t(x) + \frac{1}{2}\nabla |\nabla \phi _t(x)|^2 = -\frac{1}{4}\nabla \Delta \log \mu _t(x) - \frac{1}{8}\nabla |\log \mu _t(x)|^2 \\ \mu _0=\mu ^{\mathrm {in}},\mu _T=\mu ^{\mathrm {fin}}, \end{array}\right. } \end{aligned}$$
(37)

where to derive the latter equation we observe that the requirement that \(v_t\in {\mathcal {T}}_{\mu _t}{\mathcal {P}}_2\) for all \(t\in [0,T]\) is formally equivalent to \(v_t=\nabla \phi _t \) for some time dependent potential \((t,x)\mapsto \phi _t(x)\).

As we have seen in Sect. 1.3.1, solutions of the FBSDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm {d}X_t= Y_t\mathrm {d}t+ \mathrm {d}B_t\\ \mathrm {d}Y_t = Z_t\cdot \mathrm {d}B_t\\ X_0\sim \mu ^{\mathrm {in}},\, X_T\sim \mu ^{\mathrm {fin}}, \end{array}\right. } \end{aligned}$$
(38)

having the additional property that \(Y_t=\nabla \psi _t(X_t)\) yield a probabilistic representation for

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \mu _t(x) - \frac{1}{2}\Delta \mu _t(x)+ \nabla \cdot (\nabla \psi _t(x)\mu _t(x))=0,\\ \partial _t \nabla \psi _t(x)+ \frac{1}{2}\nabla \Delta \psi _t(x)+\nabla ^2 \psi _t(x) \cdot \nabla \psi _t(x)=0,\\ \mu _0(x) = \mu ^{\mathrm {in}}(x), \mu _T(x) = \mu ^{\mathrm {fin}}(x). \end{array}\right. } \end{aligned}$$
(39)

Some tedious though standard calculations allow to see that the change of variable \(\phi _t= -\frac{1}{2}\log \mu _t + \psi _t\) transforms the PDE system (39) in (37). Summing up, we have obtained the following

Informal statement We have:

  1. (i)

    If \((X_t,Y_t,Z_t)_{t\in [0,T]}\)is a solution for the FBSDE (38) such that \(Y_t=\nabla \psi _t(X_t)\)for some time-varying potential \(\psi \), then the marginal flow \((\mu _t)_{t\in [0,T]}\)of \(X_t\)is a solution for the Newton’s law (36).

  2. (ii)

    If \(\mathrm {P}\)is the (classical) Schrödinger bridge between \(\mu ^{\mathrm {in}}\)and \(\mu ^{\mathrm {fin}}\), then under \(\mathrm {P}\)the canonical process \((X_t)_{t\in [0,T]}\)is such that there exist processes \((Y_t)_{t\in [0,T]}\), \((Z_t)_{t\in [0,T]}\)with the property that \((X_t,Y_t,Z_t)_{t\in [0,T]}\)is a solution for (38) and \(Y_t\)is as in (i)

We leave it to future work to prove a rigorous version of the informal statement above. On the formal level, there is no conceptual difficulty in extending it to include the interaction potential W. Essentially, the only difference is that one has to deal with the non linear Fisher information functional \({\mathcal {I}}_{{\mathcal {F}}}\) instead of \({\mathcal {I}}\).

Beside its intrinsic interest, the parallelism between Newton’s laws and FBSDEs is very useful when studying the long time behavior of the latter. Indeed, the Riemannian structure underlying (36) allows to find tractable expressions for the first and second derivative of entropy-like functionals along the marginal flow of the FBSDE.

Remark 2.1

Classical Schrödinger bridges are \(h-\)transforms in the sense of Doob [22]. Therefore, one can also describe their dynamics with a first order SDE and a PDE that encodes the evolution of the drift field. This is not strictly speaking a probabilistic representation of (36) since there is already a PDE involved. Our FBSDE approach may be viewed as a way to interpret in a trajectorial sense the PDE governing the drift in the \(h-\)transform representation.

3 The mean field Schrödinger problem and its equivalent formulations: proofs

In this part we complement the discussion undertaken in Sect. 1.2 and provide the proofs of the results stated therein. This section is organized into four subsections so that

  • Section 3.1 contains the proof of Theorem 3.1, which generalizes Theorem 1.1, along with several useful lemmas,

  • Section 3.2 is where we prove Proposition 1.1, Lemma 1.1 and Theorem 1.3.

  • Theorem 1.2 is proven in Sect. 3.3.

  • Finally, Corollary 1.1 and 1.2 are proven in Sect. 3.4.

In the whole section, apart from Sect. 3.1 that has its own assumptions, we always assume that (H1), (H2) are in force, even if we do not write them down explicitly in the statements of the lemmas and propositions.

3.1 A large deviations principle for particles interacting through their drifts

We consider for \(N\in {\mathbb {N}}\) the interacting particle system

$$\begin{aligned} \left\{ \begin{array}{rl} \mathrm {d}X_t^{i,N}=&{}\frac{1}{N} \sum \limits _{k=1 }^Nb\left( t,X^{i,N},X^{k,N}\right) \mathrm {d}t+\mathrm {d}B^i_t \\ X_0^{i,N}\sim &{} \mu ^{\mathrm {in}},\,\,\, i=1,\ldots ,N. \end{array} \right. \end{aligned}$$

where \(\{B^i:i=1,\ldots , N\}\) are independent Brownian motions and \(\{X_0^{i,N}:i=1,\ldots , N\}\) are independent to each other and to the Brownian motions. Regarding the drift b, we assume

$$\begin{aligned}&[0,T]\times \Omega \times \Omega \ni (t,\omega ,{\bar{\omega }})\mapsto b(t,\omega ,{\bar{\omega }})\in {\mathbb {R}}^d \text { is progressively measurable,} \end{aligned}$$
(40)
$$\begin{aligned}&|b(t,\omega ^1,{\bar{\omega }}^1)-b(t,\omega ^2,{\bar{\omega }}^2)|\le C\left\{ \sup _{s\le t}|\omega ^1_s-\omega _s^2|+ \sup _{s\le t}|{\bar{\omega }}^1_s-{\bar{\omega }}_s^2|\right\} \end{aligned}$$
(41)
$$\begin{aligned}&\int _0^T|b(s,0,0)|\mathrm {d}s\le C, \end{aligned}$$
(42)

for some constant \(C>0\) and all \((t,\omega ^1,\omega ^2,{\bar{\omega }}^1,{\bar{\omega }}^2)\in [0,T]\times \Omega ^4\). Finally, regarding the measure \(\mu ^{\mathrm {in}}\) we assume that

$$\begin{aligned} \int _{{\mathbb {R}}^d}\exp (r|x|^\beta )\mu ^{\mathrm {in}}(\mathrm {d}x)<\infty \text { for all }r>0. \end{aligned}$$
(43)

We stress that the usual theory of stochastic differential equations guarantees the strong existence and uniqueness for the above interacting particle system. Furthermore, if \(\mathrm {P}\in {\mathcal {P}}_{1}(\Omega ) \) then the same arguments show that the stochastic differential equation

$$\begin{aligned} \left\{ \begin{array}{rl} \mathrm {d}X_t^{\mathrm {P}}=&{} \left[ \int b\left( t,X^{\mathrm {P}},{\bar{\omega }}\right) \mathrm {P}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}t+\mathrm {d}B_t \\ X_0^{\mathrm {P}}\sim &{} \mu ^{\mathrm {in}}, \end{array} \right. \end{aligned}$$

admits a unique strong solution. We denote \(\Gamma (\mathrm {P})\) the law of \(X^\mathrm {P}\). We can now state the main result of this part, which contains Theorem 1.1 as a very particular case.

Theorem 3.1

Let \(\beta \in [1,2)\) and assume (40), (41), (42), (43). Then the sequence of empirical measures

$$\begin{aligned} \left\{ \frac{1}{N}\sum \limits _{i=1}^N\delta _{X^{i,N}_{\cdot }} :N\in {\mathbb {N}} \right\} , \end{aligned}$$

satisfies a LDP on \({\mathcal {P}}_{\beta }(\Omega )\) equipped with the \({\mathcal {W}}_\beta \)-topology, with good rate function given by

$$\begin{aligned} {\mathcal {P}}_{\beta }(\Omega )\ni \mathrm {P}\mapsto {\mathscr {I}}(\mathrm {P}):=\left\{ \begin{array}{ll} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P})),&{}\quad \mathrm {P}\ll \Gamma (\mathrm {P}), \\ +\infty ,&{}\quad {\text {otherwise}}. \end{array} \right. \end{aligned}$$
(44)

The result is sharp, in that it fails for \(\beta =2\); see [56]. We follow Tanaka’s reasoning [49] in order to establish this large deviations result. We remark that the assumption on exponential moments (43) is only used in the proof of Theorem 3.1, and not in the results preceding this proof.

For \(\mathrm {Q}\in {\mathcal {P}}_\beta (\Omega )\) we consider the equation

$$\begin{aligned} Y_t(\omega )=\omega _t+\int _0^t\left[ \int b(s,Y(\omega ),Y({\bar{\omega }})) \mathrm {Q}(\mathrm {d}{\bar{\omega }}) \right] \mathrm {d}s. \end{aligned}$$
(45)

Lemma 3.1

Take \(Y^{(0)}_t(\omega ):=\omega _0\), \(\mathrm {Q}\in {\mathcal {P}}_\beta (\Omega )\), and consider the iterations

$$\begin{aligned} Y_t^{(n+1)}(\omega )=\omega _t+\int _0^t\left[ \int b(s,Y^{(n)}(\omega ),Y^{(n)}({\bar{\omega }})) \mathrm {Q}(\mathrm {d}{\bar{\omega }}) \right] \mathrm {d}s,\,\, s\le T. \end{aligned}$$

Then

  1. (a)

    The iteration is well-defined \(\omega \)-by-\(\omega \) (in particular, the \(\mathrm {Q}\)-integrals are well-defined and finite) and in fact \(\sup _n{\mathbb {E}}_\mathrm {Q}\left[ \sup _{t\le T}|Y^{n}_t|^\beta \right] \) is finite.

  2. (b)

    For each \(\omega \in \Omega \) the sequence \(\{Y^{(n)}(\omega )\}_{n\in {\mathbb {N}}}\) is convergent in the sup-norm to some limiting continuous path \(Y^{(\infty )}(\omega )\). Further \({\mathbb {E}}_\mathrm {Q}\left[ \sup _{t\le T}|Y^{(\infty )}_t|^\beta \right] <\infty \), \({\mathbb {E}}_\mathrm {Q}\left[ \sup _{t\le T}|Y^{(\infty )}_t-Y^{(n)}_t| \right] \rightarrow 0\), and \(Y^{(\infty )}\) is adapted to the canonical filtration.

Proof

From the Lipschitz assumption on b we first derive

$$\begin{aligned} \sup _{s\le t}|Y^{(n+1)}_s|&\le \sup _{s\le t}|\omega _s|+\int _0^T|b(s,0,0)|\mathrm {d}s +C\int _0^t\sup _{r\le s}|Y^{(n)}_r|\mathrm {d}r \nonumber \\&\quad +C\int _0^t{\mathbb {E}}_\mathrm {Q}\left[ \sup _{r\le s}|Y^{(n)}_r|\right] \mathrm {d}r. \end{aligned}$$
(46)

Raising this to \(\beta \), taking expectations and using Jensen’s inequality, we derive

$$\begin{aligned} {\mathbb {E}}_\mathrm {Q}\left[ \sup _{s\le t}|Y^{(n+1)}_s|^\beta \right] \le C'\left( 1+ {\mathbb {E}}_\mathrm {Q}\left[ \sup _{s\le T}|\omega _s|^\beta \right] +\int _0^t{\mathbb {E}}_\mathrm {Q}\left[ \sup _{r\le s}|Y^{(n)}_r|^\beta \right] \mathrm {d}r \right) , \end{aligned}$$

where \(C'\) only depends on T and \(\beta \). From this we establish for some \(R\ge 0\) that

$$\begin{aligned} \sup _n {\mathbb {E}}_\mathrm {Q}\left[ \sup _{s\le t}|Y^{(n)}_s|^\beta \right] \le Re^{Rt}. \end{aligned}$$

Now denote \(\Delta ^{n}_t:=\sup _{s\le t}|Y^{(n)}_s-Y^{(n-1)}_s|\). Again by the Lipschitz property

$$\begin{aligned} \Delta ^{n+1}_t\le C\int _0^t \left\{ \Delta ^{n}_s + {\mathbb {E}}_\mathrm {Q}[\Delta ^{n}_s] \right\} \mathrm {d}s, \end{aligned}$$

which we can bootstrap to obtain

$$\begin{aligned} \Delta ^{n+1}_t+{\mathbb {E}}_\mathrm {Q}[\Delta ^{n+1}_t] \le 3C\int _0^t \left\{ \Delta ^{n}_s + {\mathbb {E}}_\mathrm {Q}[\Delta ^{n}_s] \right\} \mathrm {d}s. \end{aligned}$$

Observe that \(\Delta _t^1\le 2\sup _{s\le T}|\omega _s-\omega _0|+C\), so from the above inequality we obtain by induction that \(\Delta ^{n+1}_t+{\mathbb {E}}_\mathrm {Q}[\Delta ^{n+1}_t] \le C'' \frac{t^n}{n!}\). From this \(\{\Delta ^{n}_T+{\mathbb {E}}_\mathrm {Q}[\Delta ^{n}_T]\}_{n\in {\mathbb {N}}}\) is (for each \(\omega \)) summable in n, so the same happens to \(\{\Delta ^{n}_T\}_{n\in {\mathbb {N}}}\) and therefore the uniform limit of the \(Y^{(n)}\) exists for all \(\omega \). We denote by \(Y^{(\infty )}\) this limit. By Fatou’s lemma \({\mathbb {E}}_\mathrm {Q}\left[ \sup _{t\le T}|Y^{(\infty )}_t|^\beta \right] <\infty \). Since \(\left( {\mathbb {E}}_\mathrm {Q}[\Delta ^{n}_T]\right) _{n\in {\mathbb {N}}}\) is summable we must also have \({\mathbb {E}}_\mathrm {Q}\left[ \sup _{t\le T}|Y^{(\infty )}_t-Y^{(n)}_t| \right] \rightarrow 0\). Since clearly each \(Y^{(n)}\) is adapted so is \(Y^{(\infty )}\) too.

\(\square \)

Lemma 3.2

For any \(\mathrm {Q}\in {\mathcal {P}}_{\beta }\) there exists a unique adapted continuous process satisfying (45) pointwise. Denoting \(Y^\mathrm {Q}\) this process, we further have

$$\begin{aligned} \mathrm {Q}\circ (Y^\mathrm {Q})^{-1}\in {\mathcal {P}}_\beta (\Omega ). \end{aligned}$$

Proof

If X and Y are solutions, then the Lipschitz assumption on b implies

$$\begin{aligned} {\mathbb {E}}_\mathrm {Q}\left[ \sup _{s\le t}|Y_s-X_s| \right] \le K \int _0^t {\mathbb {E}}_\mathrm {Q}\left[ \sup _{r\le s}|Y_r-X_r| \right] \mathrm {d}r, \end{aligned}$$

so from Grönwall we derive \({\mathbb {E}}_\mathrm {Q}\left[ \sup _{s\le T}|Y_s-X_s| \right] =0\). With this, and using again the Lipschitz assumption on b, we find

$$\begin{aligned} \sup _{s\le t}|Y_s-X_s| \le K \int _0^t \sup _{r\le s}|Y_r-X_r| \mathrm {d}r, \end{aligned}$$

so by Grönwall we deduce that \(X=Y\) pointwise. For the existence of a solution we employ Point (b) of Lemma 3.1, taking limits in the iterations therein (the exchange of limit and integral is justified by the Lipschitz property of b). Finally \(\mathrm {Q}\circ (Y^\mathrm {Q})^{-1}\in {\mathcal {P}}_\beta (\Omega )\) follows by Point (b) of Lemma 3.1 too. \(\square \)

Thanks to this result we can define the operator

$$\begin{aligned} \begin{aligned} \Theta :&({\mathcal {P}}_\beta ,{\mathcal {W}}_\beta ) \rightarrow ({\mathcal {P}}_\beta ,{\mathcal {W}}_\beta )\\&\,\,\,\,\mathrm {Q}\mapsto \Theta (\mathrm {Q}):=\mathrm {Q}\circ (Y^\mathrm {Q})^{-1}, \end{aligned} \end{aligned}$$
(47)

where \(Y^\mathrm {Q}\) denotes the unique solution of (45).

Lemma 3.3

\(Y^{\mathrm {R}^{\mu ^{\mathrm {in}}}}\) is the unique strong solution to the McKean–Vlasov SDE

$$\begin{aligned} \left\{ \begin{array}{rl} \mathrm {d}Z_t=&{} \left[ \int b\left( t,Z,{\bar{\omega }}\right) \mathrm {P}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}t+\mathrm {d}B_t \\ Z\sim &{} \mathrm {P},\, Z_0\sim \mu ^{\mathrm {in}}. \end{array} \right. \end{aligned}$$

Furthermore, if \(\{X^{i,N}:i\le N, N\in {\mathbb {N}}\}\) is the aforementioned interacting particle system, which is driven by \(\{B^i:i\in {\mathbb {N}}\}\) independent Brownian motions started like \(\mu ^{\mathrm {in}}\), then

$$\begin{aligned} \Theta \Bigg (\frac{1}{N}\sum _{i=1}^N\delta _{B^i_{\cdot }} \Bigg )= \frac{1}{N}\sum _{i=1}^N\delta _{X^{i,N}_{\cdot }},\,\, a.s. \end{aligned}$$
(48)

Proof

That \(Y^{\mathrm {R}^{\mu ^{\mathrm {in}}}}\) is a solution to the McKean–Vlasov SDE is clear since \(\omega \) is a Brownian motion under \(\mathrm {R}^{\mu ^{\mathrm {in}}}\). That the solution is unique follows by observing that the drift in this SDE is Lipschitz jointly in Z and \(P=\text {Law}(Z)\), from where usual arguments apply. For the second point, consider first \(\omega ^1,\dots ,\omega ^N\) continuous paths and define \( \mathrm {Q}=\frac{1}{N}\sum _{i=1}^N\delta _{\omega ^i}\). Then for all \(1\le i\le N\) we have

$$\begin{aligned} Y^\mathrm {Q}_t(\omega ^i)=\omega ^i_t + \int _0^t\Bigg (\frac{1}{N}\sum \limits _{k\le N}b(s,Y^\mathrm {Q}(\omega ^i),Y^\mathrm {Q}(\omega ^k)) \Bigg )\mathrm {d}s. \end{aligned}$$

Replacing the deterministic paths \(\omega ^1,\dots ,\omega ^N\) by those of \(B^1,\dots ,B^N\) we conclude.

\(\square \)

The key observation is that \(\frac{1}{N}\sum _{i=1}^N\delta _{B^i} \) satisfies a large deviations principle in \({\mathcal {P}}_\beta (\Omega )\) equipped with the \({\mathcal {W}}_\beta \) topology, with good rate function given by the relative entropy \({\mathcal {H}}(\cdot |\mathrm {R}^{\mu ^{\mathrm {in}}})\). This is true for \(\beta <2\) under our exponential moments assumption (43), but fails for \(\beta =2\), as follows easily from [56]. By Lemma 3.3 we may derive, via the contraction principle ([21, Theorem 4.2.1]) a large deviations principle for

$$\begin{aligned} \Bigg \{\frac{1}{N}\sum \limits _{i=1}^N\delta _{X^{i,N}} :N\in {\mathbb {N}} \Bigg \}, \end{aligned}$$

if we could only establish the continuity of \(\Theta \). This is our next step.

Lemma 3.4

\(\Theta \) is Lipschitz-continuous and injective.

Proof

We first prove the Lipschitz property. Let \(\pi \) be a coupling with first marginal \(\mathrm {Q}\) and second marginal \(\mathrm {P}\). Denoting \((\omega ,{\bar{\omega }})\) the canonical process on \(\Omega \times \Omega \), and by the Lipschitz assumption on b, we have

$$\begin{aligned} {\mathbb {E}}_\pi \left[ \sup _{s\le t}|Y_s^\mathrm {Q}(\omega )-Y_s^\mathrm {P}({\bar{\omega }})|^\beta \right]\le & {} K\int _0^t {\mathbb {E}}_\pi \left[ |Y_s^\mathrm {Q}(\omega )-Y_s^\mathrm {P}({\bar{\omega }})|^\beta \right] \mathrm {d}s\\&+{\mathbb {E}}_\pi \left[ \sup _{s\le t}|\omega _s-{\bar{\omega }}_s|^\beta \right] . \end{aligned}$$

By Grönwall we have

$$\begin{aligned} {\mathbb {E}}_\pi \left[ \sup _{s\le T}|Y_s^\mathrm {Q}(\omega )-Y_s^\mathrm {P}({\bar{\omega }})|^\beta \right] \le K' {\mathbb {E}}_\pi \left[ \sup _{s\le T}|\omega _s-{\bar{\omega }}_s|^\beta \right] , \end{aligned}$$

so taking infimum over such \(\pi \) we conclude that

$$\begin{aligned} {\mathcal {W}}_\beta (\Theta (\mathrm {Q}),\Theta (\mathrm {P}))\le K' {\mathcal {W}}_\beta (\mathrm {Q},\mathrm {P}). \end{aligned}$$

We now prove that \(\Theta \) is injective. Let \(\mathrm {P}=\Theta (\mathrm {Q})=\Theta ({{\hat{\mathrm {Q}}}})\). By definition we have \(\mathrm {Q}\)-a.s.

$$\begin{aligned} \omega _t&= Y_t^\mathrm {Q}(\omega )-\int _0^t\left[ \int b(s,Y_s^\mathrm {Q}(\omega ),Y_s^\mathrm {Q}({\bar{\omega }})) \mathrm {Q}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}s \\&= Y_t^\mathrm {Q}(\omega )-\int _0^t\left[ \int b(s,Y_s^\mathrm {Q}(\omega ),{\bar{\omega }}) \mathrm {P}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}s, \end{aligned}$$

and the same holds for \({{\hat{\mathrm {Q}}}}\) instead of \(\mathrm {Q}\). Denoting

$$\begin{aligned} F(\omega ):=\omega -\int _0^\cdot \left[ \int b(s,\omega ,{\bar{\omega }}) \mathrm {P}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}s, \end{aligned}$$

we therefore have

$$\begin{aligned} \omega _t&= F(Y^\mathrm {Q})_t\,\, (\mathrm {Q}-a.s.),\\ \omega _t&= F(Y^{{{\hat{\mathrm {Q}}}}})_t\,\, ({{\hat{\mathrm {Q}}}}-a.s.). \end{aligned}$$

Hence \(\mathrm {Q}=\Theta (\mathrm {Q})\circ (F)^{-1}=\mathrm {P}\circ (F)^{-1}=\Theta ({{\hat{\mathrm {Q}}}})\circ (F)^{-1}={{\hat{\mathrm {Q}}}}\). \(\square \)

We can now provide the proof of Theorem 3.1:

Proof of Theorem 3.1

As we have observed, if \(\{B^i:i\in {\mathbb {N}}\}\) is and iid sequence of \(\mathrm {R}^{\mu ^{\mathrm {in}}}\)-distributed processes, then \(\frac{1}{N}\sum _{i=1 }^N\delta _{B^i_{\cdot }} \) satisfies a large deviations principle in \({\mathcal {P}}_\beta (\Omega )\) equipped with the \({\mathcal {W}}_\beta \) topology, with good rate function given by the relative entropy \({\mathcal {H}}(\cdot |\mathrm {R}^{\mu ^{\mathrm {in}}})\). By (48), and since \(\Theta :({\mathcal {P}}_\beta ,{\mathcal {W}}_\beta ) \rightarrow ({\mathcal {P}}_\beta ,{\mathcal {W}}_\beta )\) is continuous, the contraction principle establishes that \(\{\frac{1}{N}\sum _{i=1}^N\delta _{X^{i,N}_{\cdot }} :N\in {\mathbb {N}} \}\) satisfies a large deviations principle in \({\mathcal {P}}_\beta (\Omega )\) equipped with the \({\mathcal {W}}_\beta \) topology. Since \(\Theta \) is injective the good rate function is given by

$$\begin{aligned} {\tilde{{\mathscr {I}}}}(\mathrm {P}):=\left\{ \begin{array}{ll} {\mathcal {H}}(\Theta ^{-1}(\mathrm {P})|\mathrm {R}^{\mu ^{\mathrm {in}}})&{} \text { if } \mathrm {P}\in \text { range} (\Theta ) \\ +\infty &{}\text { otherwise}. \end{array} \right. \end{aligned}$$

In fact observe that if \(\mathrm {P}\in \) range\((\Theta )\) and \(\Theta ^{-1}(\mathrm {P})\ll \mathrm {R}^{\mu ^{\mathrm {in}}}\) thenFootnote 7\(\mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}}\), so

$$\begin{aligned} {\tilde{{\mathscr {I}}}}(\mathrm {P}):=\left\{ \begin{array}{ll} {\mathcal {H}}(\Theta ^{-1}(\mathrm {P})|\mathrm {R}^{\mu ^{\mathrm {in}}})&{} \text { if } \mathrm {P}\in \text { range }(\Theta )\text { and } \mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}} \\ +\infty &{}\text { otherwise}. \end{array} \right. \end{aligned}$$

Now take \(\mathrm {P}\in \) range\((\Theta )\) and call \(\mathrm {Q}=\Theta ^{-1}(\mathrm {P})\). It is immediate by the definition of \(\Gamma (\cdot )\) that \(\Gamma (\mathrm {P})=\mathrm {R}^{\mu ^{\mathrm {in}}}\circ (Y^\mathrm {Q})^{-1}\). On the other hand observe that the filtration generated by \(Y^\mathrm {Q}\) is equal to the canonical filtration: indeed \(Y^\mathrm {Q}\) is adapted and conversely

$$\begin{aligned} \omega _t=Y^\mathrm {Q}_t-\int _0^t \left[ \int b(s,Y^\mathrm {Q}_s,{\bar{\omega }})\mathrm {P}(\mathrm {d}{\bar{\omega }}) \right] \mathrm {d}s=:h_t(Y^Q), \end{aligned}$$

so the canonical process is \(Y^\mathrm {Q}\)-adapted. From this

$$\begin{aligned} \frac{\mathrm {d}\Big (\mathrm {Q}\circ (Y^\mathrm {Q})^{-1}\Big )}{\mathrm {d}\Big (\mathrm {R}^{\mu ^{\mathrm {in}}} \circ (Y^\mathrm {Q})^{-1}\Big )}={\mathbb {E}}_{\mathrm {R}^{\mu ^{\mathrm {in}}}}\left[ \frac{\mathrm {d}\mathrm {Q}}{\mathrm {d}\mathrm {R}^{\mu ^{\mathrm {in}}}} |\sigma (Y^\mathrm {Q})\right] =\frac{\mathrm {d}\mathrm {Q}}{\mathrm {d}\mathrm {R}^{\mu ^{\mathrm {in}}}}\circ h. \end{aligned}$$

Hence

$$\begin{aligned} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))= & {} {\mathcal {H}}(\mathrm {Q}\circ (Y^\mathrm {Q})^{-1}|\mathrm {R}^{\mu ^{\mathrm {in}}} \circ (Y^\mathrm {Q})^{-1})={\mathbb {E}}_{\mathrm {Q}\circ (Y^\mathrm {Q})^{-1}}\left[ \log \frac{\mathrm {d}\mathrm {Q}}{\mathrm {d}\mathrm {R}^{\mu ^{\mathrm {in}}}}\circ h \right] \\= & {} {\mathcal {H}}(\mathrm {Q}|\mathrm {R}^{\mu ^{\mathrm {in}}})={\mathcal {H}}(\Theta ^{-1}(\mathrm {P})|\mathrm {R}^{\mu ^{\mathrm {in}}}), \end{aligned}$$

and therefore

$$\begin{aligned} {\tilde{{\mathscr {I}}}}(\mathrm {P})=\left\{ \begin{array}{ll} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))&{} \text { if } \mathrm {P}\in \text { range }(\Theta ) \text { and } \mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}} \\ +\infty &{}\text { otherwise}. \end{array} \right. \end{aligned}$$

The next step is to show that \(\mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}}\) implies \(\mathrm {P}\in \) range\((\Theta )\). In fact, denote by \(\tau \) the adapted transformation

$$\begin{aligned} \omega \mapsto \tau _t(\omega )=\omega _t-\int _0^t\int b(s,\omega ,{\bar{\omega }})\mathrm {P}(\mathrm {d}{\bar{\omega }})\mathrm {d}s. \end{aligned}$$

On the other hand call \(X^{\mathrm {P}}\) the unique adapted pointwise solution to

$$\begin{aligned} X_t^{\mathrm {P}}=\omega _0+\int _0^t\left[ \int b\left( s,X^{\mathrm {P}},{\bar{\omega }}\right) \mathrm {P}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}s + \omega _t, \end{aligned}$$

which exists by Lemma 3.2 applied to the drift \(\int b(\cdot ,\cdot ,\bar{\omega })\mathrm {P}(\mathrm {d}\bar{\omega })\). As we recall in Lemma 5.4, \(X^{\mathrm {P}}\) and \(\tau \) are \(\mathrm {P}\)-a.s. inverses if \(\mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}}\), since the above drift is Lipschitz. Now introduce \(\mathrm {Q}:=\mathrm {P}\circ (\tau )^{-1} \), so that \(\mathrm {Q}\circ (X^{\mathrm {P}})^{-1}=\mathrm {P}\) and in particular

$$\begin{aligned} X_t^{\mathrm {P}}=\omega _0+\int _0^t\left[ \int b\left( s,X^{\mathrm {P}},X^{\mathrm {P}}({\bar{\omega }})\right) \mathrm {Q}(\mathrm {d}{\bar{\omega }})\right] \mathrm {d}s + \omega _t. \end{aligned}$$

By Lemma 3.2 we have \(\Theta (\mathrm {Q}):=Q\circ (Y^{\mathrm {Q}})^{-1}=Q\circ (X^{\mathrm {P}})^{-1} =\mathrm {P}\).

We have arrived at

$$\begin{aligned} {\tilde{{\mathscr {I}}}}(\mathrm {P})=\left\{ \begin{array}{ll} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))&{}\quad {\text { if }} \mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}} \\ +\infty &{}\quad {{\text {otherwise}}}. \end{array} \right. \end{aligned}$$

To obtain the desired form (44) of the rate function it suffices to use Lemma 5.2 in the Appendix. \(\square \)

3.2 McKean–Vlasov formulation and planning McKean–Vlasov FBSDE

Proof of Lemma 1.1 and Proposition 1.1

Under (H1) for any \(\mathrm {P}\in {\mathcal {P}}_{1}(\Omega )\) the vector field

$$\begin{aligned}{}[0,T]\times {\mathbb {R}}^d\ni (t,x)\mapsto -\nabla W*\mathrm {P}_t(x):=-\int _{{\mathbb {R}}^d}\nabla W(x-z)\mathrm {P}_t(\mathrm {d}z), \end{aligned}$$

is very well-behaved. Precisely:

Lemma 3.5

Let \(\mathrm {P}\in {\mathcal {P}}_{1}(\Omega )\) and grant (H1). Then the time-dependent vector field \((t,x)\mapsto -\nabla W*\mathrm {P}_t(x) \) belongs \( {\mathcal {C}}^{0,1}([0,T]\times {\mathbb {R}}^d;{\mathbb {R}}^d)\) and is uniformly Lipschitz in the space variable.

Proof

We begin by proving continuity. Fix tx and \((t_n,x_n) \rightarrow (t,x)\). The sequence \(\nabla W(x_n-X_{t_n})\) converges pointwise to \(\nabla W(x-X_{t})\), since X is the (continuous) canonical process. By the fundamental theorem of calculus and (H1) we have \(|\nabla W(x_n-X_{t_n})|\le C_1+C_2\sup _{s\in [0,T]}|X_s|\). Since \(\mathrm {P}\in {\mathcal {P}}_{1}(\Omega )\), we may use dominated convergence to conclude \({\mathbb {E}}_{\mathrm {P}}[\nabla W(x_n-X_{t_n})] \rightarrow {\mathbb {E}}_{\mathrm {P}}\left[ \nabla W(x-X_{t})\right] \). The space Lipschitzianity of \(-\nabla W *\mathrm {P}_t\) follows from (H1). Space differentiability follows similarly from (H1) and dominated convergence. \(\square \)

We will often make use of the next technical lemma, whose proof we defer to the appendix:

Lemma 3.6

Let \(\mu \in {\mathcal {P}}_2({\mathbb {R}}^d)\) and \({{\bar{b}}} \) be of class \({\mathcal {C}}^{0,1}([0,T] \times {\mathbb {R}}^d;{\mathbb {R}}^d)\) and such that

$$\begin{aligned} \forall t\in [0,T], \, x,y\in {\mathbb {R}}^d\quad |{{\bar{b}}}(t,x)-{{\bar{b}}}(t,y)|\le C|x-y| \end{aligned}$$
(49)

for some \(C<+\infty \). Define \(\bar{\mathrm {R}}\) as the law of the SDE

$$\begin{aligned} \mathrm {d}X_t = {{\bar{b}}}(t,X_t) \mathrm {d}t + \mathrm {d}B_t, \quad X_0 \sim \mu \end{aligned}$$
(50)

and let \(\mathrm {P}\in {\mathcal {P}}(\Omega )\) with \(X_0 \sim \mu \). The following are equivalent

  1. (i)

    \({\mathcal {H}}(\mathrm {P}|\bar{\mathrm {R}})<+\infty \).

  2. (ii)

    There exist a \(\mathrm {P}\)-a.s. defined adapted process \((\bar{\alpha }_t)_{t\in [0,T]}\) such that

    $$\begin{aligned} {\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |\bar{\alpha }_t|^2 \mathrm {d}t \right] <+\infty \end{aligned}$$
    (51)

    and

    $$\begin{aligned} X_t - \int _{0}^t[ {{\bar{b}}}(s,X_s) + \bar{\alpha }_s]\,\mathrm {d}s \end{aligned}$$
    (52)

    is a Brownian motion under \(\mathrm {P}\).

Moreover, if (i), or equivalently (ii), holds, then we have

$$\begin{aligned} {\mathcal {H}}(\mathrm {P}| \bar{\mathrm {R}} ) = \frac{1}{2} {\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^T |\bar{\alpha }_t|^2 \mathrm {d}t \right] \end{aligned}$$
(53)

and

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}}\left[ \,\sup _{t\in [0,T]} |X_t|^2+|{{\bar{b}}}(t,X_t)^2|\,\right] <+\infty . \end{aligned}$$
(54)

In particular, if (i), or equivalently (ii), holds we have that \(\mathrm {P}\in {\mathcal {P}}_2(\Omega )\).

We turn to proving Lemma 1.1 stated in the introduction:

Proof of Lemma 1.1

Define the vector field \({{\bar{b}}}(t,z):= - \nabla W *\mathrm {P}_t (z)\). Lemma 3.5 grants that \({{\bar{b}}}\) fulfills the hypotheses of Lemma 3.6, giving the desired conclusions. \(\square \)

We can prove Proposition 1.1 of the introduction, concerning the existence of MFSBs. Recall the definition of \(\Gamma (\mathrm {P})\) and (MFSP) from the introduction.

Proof of Proposition 1.1

Let \(\mathrm {R}^{\mu ^{\mathrm {in}}}\) be the law of the Brownian motion started at \(\mu ^{\mathrm {in}}\). (H2) grants that the classical Schrödinger problem (namely wrt. Brownian motion) is admissible. To see this, it suffices to verify that the coupling \(\mu ^{\mathrm {in}}\otimes \mu ^{\mathrm {fin}}\) is admissible for the static version of the Schrödinger problem [36, Def 2.2] and then use the equivalence between the static and dynamic versions [36, Prop 2.3]. Therefore, there exist some \(\mathrm {P}\in {\mathcal {P}}(\Omega )\) such that \(\mathrm {P}_0=\mu ^{\mathrm {in}}\) and \({\mathcal {H}}(\mathrm {P}|\mathrm {R}^{\mu ^{\mathrm {in}}})<+\infty \). Lemma 3.6 (or its specialization Lemma 5.1 in the appendix) yields that \(\mathrm {P}\in {\mathcal {P}}_{1}(\Omega )\). On the other hand Lemma 5.2 in the appendix proves that for any \(\mathrm {P}\in {\mathcal {P}}(\Omega )\)\({\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))<+\infty \) if and only if \({\mathcal {H}}(\mathrm {P}|\mathrm {R}^{\mu ^{\mathrm {in}}})<+\infty \). Thus (MFSP) is admissible as well. Now observe that \(\mathrm {P}\mapsto {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))\) is lower semicontinuous in \({\mathcal {P}}_\beta (\Omega )\), since on the one hand the relative entropy is jointly lower semicontinuous in the weak topology, and on the other hand \(\Gamma \) is readily seen to be continuous in \({\mathcal {P}}_{1}(\Omega )\). Recalling the definition of the operator \(\Theta \) given in (47), to finish the proof we only need to justify that

$$\begin{aligned} \theta _M:=\{\mathrm {P}\in {\mathcal {P}}_1(\Omega ):\,{\mathcal {H}}(\Theta ^{-1}(\mathrm {P})|\mathrm {R}^{\mu ^{\mathrm {in}}})\le M,\, \mathrm {P}_0=\mu ^{\mathrm {in}}\}, \end{aligned}$$

is relatively compact in \({\mathcal {P}}_1(\Omega )\) for each M, since the proof of Theorem 3.1 establishedFootnote 8 that \({\mathcal {H}}(\Theta ^{-1}(\mathrm {P})|\mathrm {R}^{\mu ^{\mathrm {in}}})= {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))\) if \(\mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}}\). Now remark that

$$\begin{aligned} \theta _M\subset \Theta \left( \{\mathrm {Q}:{\mathcal {H}}(\mathrm {Q}|\mathrm {R}^{\mu ^{\mathrm {in}}})\le M, \mathrm {Q}_0=\mu ^{\mathrm {in}}\}\right) \subset \Theta \left( \{\mathrm {Q}:{\mathcal {H}}(\mathrm {Q}|\mathrm {R}^{\gamma })\le {{\bar{M}}}, \mathrm {Q}_0=\mu ^{\mathrm {in}}\}\right) , \end{aligned}$$

where \(\gamma \) denotes the standard Gaussian, since by the decomposition of the entropy we have

$$\begin{aligned} {\mathcal {H}}(\mathrm {P}|\mathrm {R}^{\gamma }) = {\mathcal {H}}(\mu ^{\mathrm {in}}|\gamma )+ {\mathcal {H}}(\mathrm {P}|\mathrm {R}^{\mu ^{\mathrm {in}}}), \end{aligned}$$

and by Assumption (H2)

$$\begin{aligned} {\mathcal {H}}(\mu ^{\mathrm {in}}|\gamma )= & {} \int \log \mu ^{\mathrm {in}}(x)\mu ^{\mathrm {in}}(\mathrm {d}x)-\int \log (\gamma (x))\mu ^{\mathrm {in}}(\mathrm {d}x)\\= & {} \int \log \mu ^{\mathrm {in}}(x)\mu ^{\mathrm {in}}(\mathrm {d}x)+c-\int \frac{|x|^2}{2}\mu ^{\mathrm {in}}(\mathrm {d}x) <\infty . \end{aligned}$$

As \(\Theta \) is per Lemma 3.4 Lipschitz in \({\mathcal {P}}_1(\Omega )\), it remains to prove that \(\{{\mathcal {H}}(\mathrm {Q}|\mathrm {R}^{\gamma })\le {{\bar{M}}}\}\) is \({\mathcal {W}}_1\)-compact. This can be easily done by hand, or by invoking Sanov Theorem in the \({\mathcal {W}}_1\)-topology for independent particles distributed according to \(\mathrm {R}^\gamma \) (see e.g. [56]), finishing the proof. \(\square \)

Proof of Theorem 1.3 We split the proof into two propositions, namely Propositions 3.1 and 3.2. We begin by addressing the issue of Markovianity of the minimizers. Recall the definition of \(\mathrm {H}_{-1}((\mu _t)_{t\in [0,T]})\) given under ‘frequently used notation.’ We rely strongly on the work [14] by Cattiaux and Léonard for the proof of the following result:

Proposition 3.1

Let \(\mathrm {P}\) be optimal for (MFSP). Then there exists \(\Psi \in \mathrm {H}_{-1}((\mathrm {P}_t)_{t\in [0,T]})\) such that

$$\begin{aligned} (\mathrm {d}t\times \mathrm {d}\mathrm {P}\text {-a.s.}) \quad \alpha ^{\mathrm {P}}_t= \Psi _t(X_t), \end{aligned}$$
(55)

where \((\alpha ^{\mathrm {P}}_t)_{t\in [0,T]}\) is given in Lemma 1.1.

Proof

If \(\mathrm {P}\) be optimal for (MFSP), then it is also optimal for

$$\begin{aligned} \inf \left\{ {\mathcal {H}}(\mathrm {Q}| \Gamma (\mathrm {P}) ) : \mathrm {Q}\in {\mathcal {P}}_{1}(\Omega ),\, \mathrm {Q}_t = \mathrm {P}_t\text { for all }t\in [0,T]\right\} , \end{aligned}$$
(56)

since \(\Gamma (\mathrm {P})\) only depends on the marginals of \(\mathrm {P}\). The above problem is an instance of [14], ie. its optimizer is a so-called critical Nelson process. However, the drift of the path-measure \(\Gamma (\mathrm {P})\) may not fulfill the hypotheses in [14]. For this reason we need to make a slight detour. Let \(\theta ^n\in {\mathcal {C}}^{\infty }_{c}([0,T]\times {\mathbb {R}}^d)\) and \(\mathrm {R}^n\) be defined as in Lemma 5.3 in the appendix, meaning that \(\nabla \theta ^n_{\cdot }(\cdot )\) converges to \(-\nabla W *\mathrm {P}_t(z)\) in \(\mathrm {H}_{-1}((\mathrm {P}_t)_{t\in [0,T]})\) and that \(\mathrm {R}^n\) is the law of

$$\begin{aligned} \mathrm {d}Y_t = \nabla \theta ^n_t(Y_t)\mathrm {d}t + \mathrm {d}B_t, \quad Y_0 \sim \mu ^{\mathrm {in}}\in {\mathcal {P}}_2({\mathbb {R}}^d). \end{aligned}$$

For any n consider the problem

$$\begin{aligned} \min \left\{ {\mathcal {H}}(\mathrm {Q}| \mathrm {R}^n ) :\mathrm {Q}\in {\mathcal {P}}_{1}(\Omega ), \, \mathrm {Q}_t = \mathrm {P}_t\text { for all }t\in [0,T]\right\} . \end{aligned}$$
(57)

Using [14, Lemma 3.1,Theorem 3.6] we obtain that for all n the unique optimizer \(\bar{\mathrm {P}}\) of (57) is the same for all n, and is such that there exists \(\Phi \in \mathrm {H}_{-1} ((\mathrm {P}_t)_{t\in [0,T]})\) such that

$$\begin{aligned} X_t - \int _{0}^t \Phi _s(X_s) \mathrm {d}s \end{aligned}$$
(58)

is a Brownian motion under \(\bar{\mathrm {P}}\). Lemma 3.5 grants that if we set \({{\bar{b}}}(t,z)= -\nabla W*\mathrm {P}_t(z)\) then the hypotheses of Lemma 3.6 are met. Since \({\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))<+\infty \), we derive from (54) therein that

$$\begin{aligned} {\mathbb {E}}_{\bar{\mathrm {P}}} \left[ \int _{0}^T | \nabla W*\mathrm {P}_t(X_t)|^2\mathrm {d}t\right] ={\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T | \nabla W*\mathrm {P}_t(X_t)|^2\mathrm {d}t\right] <+\infty . \end{aligned}$$

Hence

$$\begin{aligned} {\mathbb {E}}_{\bar{\mathrm {P}}} \left[ \int _{0}^T |\Phi _t(X_t)+ \nabla W*\mathrm {P}_t(X_t)|^2\mathrm {d}t\right] <+\infty . \end{aligned}$$
(59)

Using the implication \((ii)\Rightarrow (i)\) of Lemma 3.6 we finally obtain that \({\mathcal {H}}(\bar{\mathrm {P}}|\Gamma (\mathrm {P}))<+\infty \) and therefore that we can use Lemma 5.3 for the choice \(\mathrm {Q}=\bar{\mathrm {P}}\) therein.

Now consider \(\mathrm {Q}\) admissible for (56) and such that \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {P}))<+\infty \). Using Lemma 5.3 twice we obtain

$$\begin{aligned} {\mathcal {H}}(\bar{\mathrm {P}}|\Gamma (\mathrm {P}))= \liminf _{n\rightarrow +\infty } {\mathcal {H}}(\bar{\mathrm {P}}|\mathrm {R}^n) \le \liminf _{n\rightarrow +\infty } {\mathcal {H}}(\mathrm {Q}|\mathrm {R}^n)={\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {P})) \end{aligned}$$

Thus \(\bar{\mathrm {P}}\) is also an optimizer for (56). But then \(\bar{\mathrm {P}}=\mathrm {P}\) since (56) can have at most one minimizer by strict convexity of the entropy and convexity of the admissible region. Combining (58) with (9) we get that \(\int _{0}^t \left( -\nabla W *\mathrm {P}_s (X_s) + \alpha ^{\mathrm {P}}_s - \Phi _s(X_s)\right) \mathrm {d}s\) is a continuous martingale with finite variation. But then it is constant \(\mathrm {P}\)-a.s. The conclusion follows setting \(\Psi _t(z):= \Phi _t(z)+\nabla W*\mathrm {P}_t(z)\) and observing that \(\nabla W*\mathrm {P}_{\cdot }(\cdot )\in \mathrm {H}_{-1}((\mathrm {P}_t)_{r\in [0,T]})\). \(\square \)

Notice that the above proposition proves the first half of Theorem 1.3 from the introduction. We now establish the second half of this result:

Proposition 3.2

Assume that \(\mathrm {P}\) is optimal for (MFSP). Then \(\Psi _t(X_t)\) has a continuous version adapted to the \(\mathrm {P}\)-augmented canonical filtration, and the process \((M_t)_{t\in [0,T]}\) defined by

$$\begin{aligned} M_t:=\Psi _t(X_t) - \int _{0}^t {\tilde{{\mathbb {E}}}}_{{\tilde{\mathrm {P}}}}\left[ \nabla ^2 W(X_s-{\tilde{X}}_s)\cdot (\Psi _s(X_s) - \Psi _s(\tilde{X}_s) ) \right] \, \mathrm {d}s \end{aligned}$$
(60)

is a continuous martingale under \(\mathrm {P}\) on [0, T[ and satisfies \({\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |M_t|^2 \mathrm {d}t \right] <+\infty \).

To carry out the proof, we will use a well-known characterization of martingales (see e.g. [23]) which is as follows: an adapted process \((M_t)_{t\in [0,T]}\) such that \({\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |M_t|^2 \mathrm {d}t \right] <+\infty \) is a martingale in [0, T[ under \(\mathrm {P}\) if and only if

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^T M_t h_t \mathrm {d}t \right] =0 \end{aligned}$$
(61)

for all adapted processes \((h_t)_{t\in [0,T]}\) such that

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T |h_t|^2 \mathrm {d}t \right] <+\infty , \quad \text {and} \, \int _{0}^T h_t \,\mathrm {d}t=0 \quad \mathrm {P}-\text {a.s.} \end{aligned}$$
(62)

Proof

Define \((M_t)_{t\in [0,T]}\) via (60). Using (H1), (8) and (54) we get that \({\mathbb {E}}_{\mathrm {P}}[\int _{0}^T|M_t|^2\mathrm {d}t] <+\infty \). Therefore, using the characterization of martingales [23, pp. 148–149] in order to show that \(M_t\) is a martingale on [0, T[ we need to show (61) for all adapted processes \((h_t)_{t\in [0,T]}\) satisfying (62). By a standard density argument, one can show that it suffices to obtain (61) under the additional assumption that \((h_t)_{t\in [0,T]}\) is bounded and Lipschitz, i.e.

$$\begin{aligned}&\forall t\in [0,T], \, \omega ,\bar{\omega }\in \Omega , \quad \sup _{s\in [0,t]} |h_s(\omega )-h_s(\bar{\omega })|\nonumber \\&\quad \le C \sup _{s\in [0,t]} |\omega _s-\bar{\omega _s}|,\quad \sup _{t\in [0,T]}|h_t(\omega )| \le C, \end{aligned}$$
(63)

for some \(C>0\). Consider now a process \((h_t)_{t\in [0,T]}\) satisfying (62) and (63) and for \(\varepsilon >0\) define the shift transformation

$$\begin{aligned} \tau ^{\varepsilon }:\Omega \longrightarrow \Omega , \quad \tau ^{\varepsilon }_t(\omega ) = \omega _t +\varepsilon \int _{0}^t h_s(\omega ) \mathrm {d}s. \end{aligned}$$
(64)

Under the current assumptions, \(\tau ^{\varepsilon }\) admits an adapted inverse \(Y^{\varepsilon }\), i.e. there exists an adapted process \((Y^{\varepsilon }_{t})_{t\in [0,T]}\) such that

$$\begin{aligned} \mathrm {P}-\text {a.s.}\quad \tau ^{\varepsilon }_t (Y^{\varepsilon } (\omega ))=Y^{\varepsilon }_t (\tau ^{\varepsilon } (\omega )) = \omega _t \quad \forall t\in [0,T]. \end{aligned}$$
(65)

Indeed, since \({\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))<+\infty \), Lemma 5.2 in the appendix yields that \(\mathrm {P}\ll \mathrm {R}^{\mu ^{\mathrm {in}}}\); this entitles us to apply Lemma 5.4 in the same section, providing the existence of the inverse \(Y^{\varepsilon }\).

If we set \(\mathrm {P}^{\varepsilon }=\mathrm {P}\circ (\tau ^{\varepsilon })^{-1}\) we have that \(\mathrm {P}^{\varepsilon }\in {\mathcal {P}}_{1}(\Omega )\) is admissible for (MFSP), thanks to (62). Moreover, Lemma 1.1 and (65) imply that

$$\begin{aligned} X_t - \int _{0}^t \Big ( \varepsilon h_s(Y^{\varepsilon })+\Psi _s(Y^{\varepsilon }_s)-\nabla W *\mathrm {P}_s(Y^{\varepsilon }_s) \Big ) \mathrm {d}s \end{aligned}$$

is a Brownian motion under \(\mathrm {P}^{\varepsilon }\). Combining (8), (63) and (H1) we get that

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{\mathrm {P}^{\varepsilon }}\left[ \int _{0}^T\big |\Psi _t(Y^{\varepsilon }_t) +\varepsilon h_t(Y^{\varepsilon }) -\nabla W *\mathrm {P}_t(Y^{\varepsilon }_t)+\nabla W*\mathrm {P}^{\varepsilon }_t(X_t) \big |^2\mathrm {d}t\right] <+\infty .\nonumber \\ \end{aligned}$$
(66)

Lemma 3.5 grants that \({{\bar{b}}}(t,x)=-\nabla W *\mathrm {P}^{\varepsilon }_t(x)\) fulfills the hypothesis of Lemma 3.6 and (66) allows to use the implication \((ii)\Rightarrow (i)\) which yields that \({\mathcal {H}}(\mathrm {P}^{\varepsilon }|\Gamma (\mathrm {P}^{\varepsilon }))\) is finite and equals the left hand side of (66). Using the definition of \(\mathrm {P}^{\varepsilon }\), we can rewrite \({\mathcal {H}}(\mathrm {P}^{\varepsilon }|\Gamma (\mathrm {P}^{\varepsilon }))\) as

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^T| \varepsilon h_t+\Psi _t(X_t) +\nabla W*\mathrm {P}^{\varepsilon }_t(\tau ^{\varepsilon }_t)-\nabla W *\mathrm {P}_t(X_t) |^2\mathrm {d}s\right] . \end{aligned}$$

Imposing optimality of \(\mathrm {P}\) and letting \(\varepsilon \) to zero, using Taylor’s expansion

$$\begin{aligned} 0\le & {} \liminf _{\varepsilon \rightarrow 0} \frac{{\mathcal {H}}(\mathrm {P}^{\varepsilon }|\Gamma (\mathrm {P}^{\varepsilon }))-{\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P})) }{\varepsilon }\\= & {} {\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T\Psi _t(X_t) \cdot \left( h_t + {\tilde{{\mathbb {E}}}}_{\tilde{\mathrm {P}}} \left[ \nabla ^2 W(X_t-\tilde{X}_t) \cdot \int _{0}^t h_s-\tilde{h}_s \mathrm {d}s \right] \right) \mathrm {d}t\right] . \end{aligned}$$

In the above equation, \((\tilde{X}_t,\tilde{h}_t)_{t\in [0,T]}\) is an independent copy of \((X_t,h_t)_{t\in [0,T]}\) defined on some probability space \((\tilde{\Omega },\tilde{{\mathfrak {F}}},\tilde{\mathrm {P}})\) and \(\tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}}\) denotes the expectation on \((\tilde{\Omega },\tilde{{\mathfrak {F}}},\tilde{\mathrm {P}})\). Moreover, the exchange of limit and expectation is justified by (49), (8) and the dominated convergence theorem. Using the symmetry of W, and taking \(\pm h\), we can rewrite the latter condition as

$$\begin{aligned} 0= & {} {\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^T\Psi _t(X_t) \cdot h_t \mathrm {d}t \right] \\&+ {\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T{\tilde{{\mathbb {E}}}}_{\tilde{\mathrm {P}}}\left[ \ (\Psi _t(X_t)-\Psi _t(\tilde{X}_t)) \cdot \nabla ^2 W(X_t-\tilde{X}_t)\right] \cdot \int _{0}^t h_s\mathrm {d}s \, \mathrm {d}t\right] . \end{aligned}$$

By integration by parts and the boundary condition (62) , we arrive at

$$\begin{aligned} 0= {\mathbb {E}}_{\mathrm {P}} \left[ \int _{0}^T \left( \Psi _t(X_t) - \int _{0}^t{\tilde{{\mathbb {E}}}}_{\tilde{\mathrm {P}}}\left[ (\Psi _s(X_s)-\Psi _s(\tilde{X}_s)) \cdot \nabla ^2 W(X_s-\tilde{X}_s) \right] \mathrm {d}s \right) \cdot h_t \mathrm {d}t \right] , \end{aligned}$$

proving the desired martingale property. By [45, Theorem IV.36.5] we know that a martingale in an augmented Brownian filtration admits a continuous version. Using again Lemma 5.2 we have that \(\mathrm {P}\ll \mathrm {R}^\mu \), and we so obtain a continuous version of our martingale (60), and a fortiori of \(\Psi _t(X_t)\). \(\square \)

3.3 Benamou-Brenier formulation

We finally turn to the Benamou-Brenier formulation. Recall that \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\) denotes the optimal value of the mean field Schrödinger problem. We define the set \({\mathcal {A}}\) as the collection of all absolutely continuous curves \((\mu _t)_{t\in [0,T]}\subset {\mathcal {P}}_2({\mathbb {R}}^d)\) (see Sect. 4.2) such that

$$\begin{aligned} (t,z)\mapsto \nabla \log \mu _t(z)&\in L^2(\mathrm {d}\mu _t\mathrm {d}t),\\ (t,z)\mapsto \nabla W *\mu _t(z)&\in L^2(\mathrm {d}\mu _t\mathrm {d}t). \end{aligned}$$

Recall from the introduction the problem

$$\begin{aligned}&{\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\nonumber \\&\quad :=\inf _{\begin{array}{c} (\mu _t)_{t\in [0,T]}\in {\mathcal {A}},\\ \partial _t\mu _t + \nabla \cdot (w_t\mu _t)=0 \end{array}} \, \frac{1}{2}\int \int \left| w_t(z)+ \frac{1}{2}\nabla \log \mu _t(z)\right. \nonumber \\&\qquad \left. + \nabla W *\mu _t(z)\right| ^2\mu _t(\mathrm {d}z) \mathrm {d}t \end{aligned}$$
(67)

In (67), solutions to the continuity equation \(\partial _t\mu _t + \nabla \cdot (w_t\mu _t)=0\) are meant in the weak sense.

Proof of Theorem 1.2

We first show that \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) \ge {\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\). To this end, we may assume that the l.h.s. if finite and denote \(\mathrm {P}\) an optimizer. As established in Theorem 1.3, the drift of X under \(\mathrm {P}\) is equal to

$$\begin{aligned} \int _0^t\Psi _s(X_s)-\nabla W*\mathrm {P}_s(X_s)\mathrm {d}s, \end{aligned}$$

where \(\Psi \in \mathrm {H}_{-1}((\mathrm {P}_t)_{t\in [0,T]})\) and

$$\begin{aligned} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) = \frac{1}{2} \int \int |\Psi _t(z)|^2\mathrm {P}_t(\mathrm {d}z)\mathrm {d}t. \end{aligned}$$

As we will see in Lemma 4.4 and Remark 4.1, the flow of marginals \((\mathrm {P}_t)_{t\in [0,T]}\) is absolutely continuous and its tangent velocity field v is given by

$$\begin{aligned} v_t(z):= - \nabla W*\mathrm {P}_t(z)+ \Psi _t(z) -\frac{1}{2} \nabla \log \mathrm {P}_t. \end{aligned}$$

Hence

$$\begin{aligned} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})&= \frac{1}{2} \int \int \left| v_t(z)+ \frac{1}{2} \nabla \log \mathrm {P}_t+\nabla W*\mathrm {P}_t(z)\right| ^2\mathrm {P}_t(\mathrm {d}z)\mathrm {d}t. \end{aligned}$$

We conclude the desired inequality by noticing that \( \nabla \log \mathrm {P}_t \in L^2(\mathrm {d}\mathrm {P}_t\mathrm {d}t)\) and \(\nabla W *\mathrm {P}_t \in L^2(\mathrm {d}\mathrm {P}_t\mathrm {d}t)\). To wit, the first statement follows from [25, Thm 3.10] combined with Lemma 5.2 in our appendix, and the second from (54) used with \({\bar{b}}=-\nabla W *\mathrm {P}_t(z)\). We now establish \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) \le {\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\), so we may assume that \((\mu _t)_{t\in [0,T]}\) is feasible for the r.h.s. and leads to a finite value. Denote by \({\tilde{v}}\) its tangent velocity field. We define \(\Phi _t(z):= {\tilde{v}}_t(z)+\frac{1}{2}\nabla \log \mu _t(z)\), so from the continuity equation for \((\mu _t)_{t\in [0,T]}\) we deduce the following equation in the distributional sense

$$\begin{aligned} \partial _t \mu _t+\nabla \cdot (\mu _t \Phi _t)-\frac{1}{2}\Delta \mu _t=0. \end{aligned}$$

Observing that \(\Phi \in \mathrm {H}_{-1}((\mu _t)_{t\in [0,T]})\), we may apply the equivalence “(a) iff (c)” in [14, Theorem 3.4].Footnote 9 We thus obtain a measure \(\mathrm {P}\) whose marginals are exactly \((\mu _t)_{t\in [0,T]}\), and by the uniqueness statement in [14, Theorem 3.4] we also know that the drift of X under \(\mathrm {P}\) is precisely \( \Phi _s(X_s) \). Hence

$$\begin{aligned}&\frac{1}{2}\int \int \left| {\tilde{v}}_t(z)+ \frac{1}{2}\nabla \log \mu _t(z) + \nabla W *\mu _t(z)\right| ^2\mu _t(\mathrm {d}z) \mathrm {d}t \\&\quad = \frac{1}{2}\int \int \left| \Phi _t(z) + \nabla W *\mu _t(z)\right| ^2\mu _t(\mathrm {d}z) \mathrm {d}t \\&\quad = \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _0^T|\Phi (X_t)+\nabla W*\mathrm {P}_t(X_t)|^2 \mathrm {d}t\right] \\&\quad \ge {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}), \end{aligned}$$

where the inequality follows from the equivalent expression of \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\) given in (10).

We have proven \({\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) = {\mathscr {C}}_T^{BB}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\), and the other statements follow from the previous arguments. \(\square \)

3.4 Schrödinger potentials and mean field PDE system: proofs

We start with an observation concerning the link between (15) and (16):

Remark 3.1

It is worth stressing that the link between (15) and (16) can be established if the FBSDE solution \(Y_t\) is a gradient vector field depending only on t and \(X_t\). We have gathered preliminary evidence that (15) admits non Markov solutions even in the simple case when \(W=0\). More precisely, we expect that all processes in the reciprocal class of Brownian motion (meaning that they share the same bridges, but see [37] for details) fulfilling the marginal constraints of (1.1) are solutions to (15). This is in contrast with what is expected for standard FBSDEs [10, Lemma 3.5] whose boundary conditions are not of planning type.

We now provide the belated proofs:

Proof of Corollary 1.1

We know by Theorem 1.1 that \(\Psi \) belongs to \(\mathrm {H}((\mu _t))_{t\in [0,T]}\). The regularity hypothesis imposed on \(\Psi _t(x)\) and \(\mu _t(x)\) allow us to conclude that \(\Psi \) is a true gradient, i.e. there exist \(\psi \) such that \(\Psi _t(x)=\nabla \psi _t(x)\) for all \((t,x)\in [0,T]\times {\mathbb {R}}^d\). Lemma 1.1 together with Theorem 1.3 yield that \(\mu _t\) is a weak solution of the Fokker Planck equation in (17). Because of the regularity assumptions we made on \(\Psi \) and \(\mu \), we can conclude that \(\mu _t\) is indeed a classical solution. For the same reasons, we can turn the martingale condition (1.3) into the system of PDEs

$$\begin{aligned} \forall i= & {} 1,\ldots ,d \quad \partial _t \partial _{x_i}\psi _t(x) + {\mathcal {L}} (\partial _{x_i} \psi _t(x)) \\&- \int _{{\mathbb {R}}^d} \partial _{x_i}((\nabla W(x-\tilde{x}))\cdot ( \nabla \psi (x) -\nabla \psi (\tilde{x})) \mu _t(\mathrm {d}\tilde{x}) =0, \end{aligned}$$

where \({\mathcal {L}}\) is the generator \(\frac{1}{2}\Delta + (\nabla (-W *\mu _t + \psi _t))\cdot \nabla \). After some tedious but standard calculations we can rewrite the above as

$$\begin{aligned}&\partial _{x_i} \left( \partial _t \psi _{t}(x) + \frac{1}{2}\Delta \psi _{t}(x) + \frac{1}{2} |\nabla \psi _{t}(x)|^2 \right. \\&\quad \left. + \int _{{\mathbb {R}}^d} \nabla W(x-\tilde{x}) \cdot ( \nabla \psi _t(x) -\nabla \psi _t(\tilde{x})\mu _t(\mathrm {d}\tilde{x}) ) \right) =0 \end{aligned}$$

Since \(\psi \) is defined up to the addition of a function that depends on time only, the conclusion follows. \(\square \)

Corollary 1.2 can be proven with a direct calculation using the definition of \(\varphi _t\) and (17).

4 Convergence to equilibrium and functional inequalities: proofs

In this part we complement the discussion undertaken in Sect. 1.4 and provide proofs for the results stated therein. This section is organized as follows:

  • Sections 4.1, 4.2, 4.3 are devoted to stating and proving some preparatory results that we shall use at different times in the proofs of the main results.

  • In Section 4.4 we prove Theorem 1.6, and we state and prove Theorem 4.1 together with its corollaries: the Talagrand (Corollary 1.3) and the HWI (Corollary 1.4) inequalities.

  • Finally, in Sect. 4.5 we prove Theorems 1.4 and 1.5.

In all the lemmas and theorems in this subsection we always assume (H1)–(H2) to hold, and throughout \(\mathrm {P},\alpha ^{\mathrm {P}},\Psi ,M\) are as given in Theorem 1.3. We refer to Sects. 1.2 and 1.4 for any unexplained notation.

4.1 Exponential upper bound for the corrector

Recall that we called \(\Psi \) the corrector. The goal of this part is to quantify the size of the corrector, as stated in Lemma 4.3 below. Before doing this we prove two preliminary lemmas. As usual, we denote by \(\langle \cdot \rangle \) the quadratic variation of a semimartingale.

Lemma 4.1

We have

$$\begin{aligned} \forall t\in [0,T[, \quad {\mathbb {E}}_{\mathrm {P}}[|M_t|^2]={\mathbb {E}}_{\mathrm {P}}[\langle M\rangle _t]<+\infty . \end{aligned}$$
(68)

Moreover the function \(t\mapsto {\mathbb {E}}\left[ \langle M\rangle _t \right] \) is continuous on [0, T[ and

$$\begin{aligned} \forall t\in [0,T[, \quad \sup _{s\in [0,t]}\, {\mathbb {E}}_{\mathrm {P}}[|\Psi _s(X_s)|^2]<+\infty \end{aligned}$$
(69)

Proof

We have shown at Theorem 3.2 that \({\mathbb {E}}_{\mathrm {P}}\left[ \int _0^T|M_t|^2 \mathrm {d}t \right] <+\infty \) which gives that \({\mathbb {E}}_{\mathrm {P}}\left[ |M_t|^2 \right] <+\infty \) for almost every \(t\in [0,T[\). But since \({\mathbb {E}}_{\mathrm {P}}\left[ |M_t|^2 \right] \) is an increasing function of t, we get \({\mathbb {E}}_{\mathrm {P}}\left[ |M_t|^2\right] <+\infty \) for all \(t\in [0,T[\). To complete the proof of (68) it suffices to observe that by definition of quadratic variation and since \(M_t\) is an \(L^2\)-martingale on [0, T[, we have \({\mathbb {E}}_{\mathrm {P}}\left[ |M_t|^2 \right] ={\mathbb {E}}_{\mathrm {P}}[\langle M_t\rangle ]\). To prove the continuity of \(t\mapsto {\mathbb {E}}_{\mathrm {P}}[\langle M\rangle _t]\) we start by observing that since \(M_t\) is a continuous martingale, then \(\langle M\rangle _t\) has continuous and increasing paths. Thus, we obtain by monotone convergence that \({\mathbb {E}}_{\mathrm {P}}[\langle M\rangle _{t+h}] \rightarrow {\mathbb {E}}_{\mathrm {P}}[\langle M\rangle _{t}] \) as \(h\downarrow 0\), which gives the desired result. The proof of (69) follows from (60), the bounded Hessian of W (see(H1)) and the first part of Theorem 1.3. \(\square \)

Lemma 4.2

The function \(t\mapsto {\mathbb {E}}_{\mathrm {P}}[X_t]\) is linear, the function \(t\mapsto {\mathbb {E}}_{\mathrm {P}}[\Psi _t(X_t)]\) is constant, and

$$\begin{aligned} \forall t\in [0,T[,\quad {\mathbb {E}}_{\mathrm {P}}[X_t]={\mathbb {E}}_{\mathrm {P}}[X_0] + {\mathbb {E}}_{\mathrm {P}}[\Psi _0(X_0)] t \end{aligned}$$

Proof

Using the symmetry of W and the martingale property (60) it is easily derived that \({\mathbb {E}}_{\mathrm {P}}[\Psi _t(X_t)]\) is constant as a function of t. Therefore we get for all \(t\in [0,T]\)

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}}[X_t]={\mathbb {E}}_{\mathrm {P}}[X_0]-\int _{0}^t {\mathbb {E}}_{\mathrm {P}}[\nabla W*\mathrm {P}_s(X_s)]\mathrm {d}s + {\mathbb {E}}_{\mathrm {P}}[\Psi _0(X_0)] t \end{aligned}$$

Using again the symmetry of W we get that \(\int _{0}^t {\mathbb {E}}_{\mathrm {P}}[\nabla W*\mathrm {P}_s(X_s)]\mathrm {d}s =0\), from which the conclusion follows. \(\square \)

We can now provide some key estimates on the corrector:

Lemma 4.3

Assume (H1)–(H4). If \(\mathrm {P}\) is an optimizer for (MFSP) and \(\Psi \) the associated corrector, then for any \(t\in (0,T)\) we have

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^t |\Psi _s(X_s)|^2\mathrm {d}s\right] \le \frac{\exp (2\kappa t)-1}{\exp (2\kappa T)-1} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}), \end{aligned}$$
(70)

and

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2 \right] \le \frac{2\kappa \, {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})}{\exp (2\kappa (T-t))-1}. \end{aligned}$$
(71)

Proof

Consider the function \(t\mapsto \varphi (t)\) defined by

$$\begin{aligned} \varphi (t) = \frac{1}{2}{\mathbb {E}}_{\mathrm {P}} \left[ \int _0^t |\Psi _s(X_s)|^2 \mathrm {d}s \right] . \end{aligned}$$

Fubini’s theorem allows to interchange the time integral and the expectation to get that \(\varphi \) is an absolutely continuous function with derivative

$$\begin{aligned} \varphi '(t)={\mathbb {E}}_{\mathrm {P}} \left[ |\Psi _t(X_t)|^2 \right] . \end{aligned}$$
(72)

From Itô’s formula and Theorem 1.3 we get that for all \(t\in [0,T[\)

$$\begin{aligned} |\Psi _t(X_t)|^2-|\Psi _0(X_0)|^2&= 2\int _0^t \Psi _r(X_r) \cdot \mathrm {d}M_r \\&\quad + 2\int _0^t\Psi _r(X_r)\cdot \tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}}[\nabla ^2 W(X_r-\tilde{X}_r)\cdot \\&\quad (\Psi _r(X_r)-\Psi _r(\tilde{X_r}))] \, \mathrm {d}r + \langle M \rangle _t. \end{aligned}$$

We observe that the fact that \(M_t\) is a martingale together with (69) and (68) make sure that \({\mathbb {E}}_{\mathrm {P}}\left[ \int _0^t \Psi _r(X_r)\cdot \mathrm {d}M_r \right] =0\). Thus, taking expectation on both sides of the above equation yields

$$\begin{aligned} \varphi '(t)-\varphi '(0)= & {} {\mathbb {E}}_{\mathrm {P}}\left[ 2 \int _0^t\Psi _r(X_r)\cdot \tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}}[\nabla ^2 W(X_r-\tilde{X}_r)\cdot (\Psi _r(X_r)-\Psi _r(\tilde{X_r}))] \, \mathrm {d}r\right] \nonumber \\&+ {\mathbb {E}}_{\mathrm {P}}[\langle M \rangle _t]. \end{aligned}$$
(73)

Because of (69) we can use Fubini’s Theorem and write

$$\begin{aligned}&{\mathbb {E}}_{\mathrm {P}}\left[ 2 \int _0^t\Psi _r(X_r)\cdot \tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}}[\nabla ^2 W(X_r-\tilde{X}_r)\cdot (\Psi _r(X_r)-\Psi _r(\tilde{X_r}))] \, \mathrm {d}r\right] \nonumber \\&\quad = 2 \int _0^t {\mathbb {E}}_{\mathrm {P}}\left[ \Psi _r(X_r)\cdot \tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}}[\nabla ^2 W(X_r-\tilde{X}_r)\cdot (\Psi _r(X_r)-\Psi _r(\tilde{X_r}))]\, \right] \mathrm {d}r\nonumber \\&\quad = \int _0^t {\mathbb {E}}_{\mathrm {P}\otimes \tilde{\mathrm {P}}}\left[ (\Psi _r(X_r)-\Psi _r(\tilde{X}_r)) \cdot \nabla ^2 W(X_r-\tilde{X}_r)\cdot (\Psi _r(X_r)-\Psi _r(\tilde{X}_r)) \, \right] \mathrm {d}r, \end{aligned}$$
(74)

where we used the symmetry of W to obtain the last expression. Plugging it back in (73) and using that \(t\mapsto \varphi '(t)\) is

  • continuous on [0, T[ because so are (74) and \({\mathbb {E}}[\langle M \rangle _t]\) (cf. Lemma 4.1),

  • increasing on [0, T[ since W is convex and the quadratic variation is an increasing process,

we conclude that \(t\mapsto \varphi '(t)\) is absolutely continuous on the same interval. Moreover, using the \(\kappa \)-convexity of W and again the fact that the quadratic variation is an increasing process we get

$$\begin{aligned} \varphi ''(t) \ge 2 \kappa \, {\mathbb {E}}_{\mathrm {P}}[|\Psi _t(X_t)|^2 ]=2\kappa \, \varphi '(t) \end{aligned}$$
(75)

where to establish the last inequality we used that the hypothesis on \(\mu ^{\mathrm {in}}\) and \(\mu ^{\mathrm {fin}}\) together with Lemma 4.2 imply \({\mathbb {E}}_{\mathrm {P}}[\Psi _t(X_t)]=0\). The bound (70) follows by integrating the differential inequality (75) as done for instance in Lemma 5.5 in the Appendix, and observing that \(\frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^T|\Psi _r(X_r)|^2\mathrm {d}r \right] ={\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))\). To prove (71), we begin by observing that (75) also yields that

$$\begin{aligned} \forall s\in [t,T], \quad {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _s(X_s)|^2 \right] \ge \exp (2\kappa (s-t)) {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2 \right] . \end{aligned}$$
(76)

Next, by definition of entropic cost we get the trivial bound

$$\begin{aligned} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) =\frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{0}^T |\Psi _s(X_s)|^2\mathrm {d}s \right] \ge \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{t}^T |\Psi _s(X_s)|^2\mathrm {d}t \right] \end{aligned}$$

The desired conclusion follows by plugging (76) in the above equation and some standard calculations. \(\square \)

4.2 First derivative of \({\mathcal {F}}\)

We compute the first derivative of \({\mathcal {F}}\) along the marginal flow of \(\mathrm {Q}\), assuming that \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))<+\infty \) and that \(\mathrm {Q}\) is Markov. To do this, we use an approach based on optimal transport, and some results of [25]. To be self-contained, we recall the basic notions of optimal transport we need to state the results. We refer to [1] for more details.

Tangent space Let \(\mu \in {\mathcal {P}}_2({\mathbb {R}}^d)\). The tangent space \(\mathrm {Tan}_{\mu }{\mathcal {P}}_2\) at \(\mu \) is the closure in \(L^2_{\mu }\) of

$$\begin{aligned} \left\{ \nabla \psi ; \psi \in {\mathcal {C}}^{\infty }_c({\mathbb {R}}^d) \right\} . \end{aligned}$$

Since \(L^2_{\mu }\) is an Hilbert space, given an arbitrary \(\Psi \in L^2_{\mu }\), there exists a unique projection \(\Pi _{\mu }(\Psi ) \) of \(\Psi \) onto \(\mathrm {Tan}_{\mu }{\mathcal {P}}_2({\mathbb {R}}^d)\).

Absolutely continuous curves and velocity field

Following [1, Th 8.3.1], we say that a curve \((\mu _t)_{t\in [0,T]} \subseteq {\mathcal {P}}_2({\mathbb {R}}^d)\) is absolutely continuous if there exists a Borel measurable vector field \((t,z)\mapsto w_t(z)\) such that

  • \((w_t)_{t\in [0,T]}\) solves (in the sense of distributions) the continuity equation

    $$\begin{aligned} \partial _t \mu _t + \nabla \cdot (w_t \mu _t )=0. \end{aligned}$$
    (77)
  • \(w_t\) satisfies the integrability condition

    $$\begin{aligned} \int _{0}^T\left( \int _{{\mathbb {R}}^d}|w_t(z)|^2\mu _t(\mathrm {d}z)\right) ^{1/2} \mathrm {d}t<+\infty . \end{aligned}$$

Consider an absolutely continuous curve \((\mu _t)_{t\in [0,T]}\). It is a consequence of the results in Chapter 8, and in particular of Proposition 8.4.5 of [1], that there exist a unique Borel measurable vector field \(v_t(z)\) solving (77) and such that \(z\mapsto v_t(z)\) belongs to the tangent space \(\mathrm {Tan}_{\mu _t}{\mathcal {P}}_2\) for almost every \(t\in [0,T]\). We call such \(v_t\) the (tangent) velocity field of \((\mu _t)_{t\in [0,T]}\).

Remark 4.1

Let \((\mu _t)_{t\in [0,T]}\) be an absolutely continuous curve and \(w_t(z)\) be in \(\mathrm {H}_{-1}((\mu _t)_{t\in [0,T]})\). It is rather easy to see that \(z\mapsto w_t(z)\) belongs to \(\mathrm {Tan}_{\mu _t}{\mathcal {P}}_2\) for almost every \(t\in [0,T]\).

Throughout the rest of the paper, if \(\mathrm {Q}\in {\mathcal {P}}(\Omega )\) is such that \({\mathcal {H}}(\mathrm {Q}| \Gamma (\mathrm {Q}))<+\infty \), we say that \(\mathrm {Q}\) is Markov if \(\alpha ^{\mathrm {Q}}_t\) is \(\sigma (X_t)\)-measurable for all \(t\in [0,T]\), \((\alpha ^{\mathrm {Q}}_t)_{t\in [0,T]}\) being defined by (8). In that case we write \(\Xi ^{\mathrm {Q}}_t(X_t)\) instead of \(\alpha ^{\mathrm {Q}}_t\).

Lemma 4.4

Let \( \mathrm {Q}\) be such that \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))<+\infty \) and Markov. Then

  1. (i)

    \((\mathrm {Q}_t)_{t\in [0,T]}\) is an absolutely continuous curve. Its tangent velocity field is given by

    $$\begin{aligned} v_t(z) = -\nabla W *\mathrm {Q}_t(z) + \Pi _{\mathrm {Q}_t}(\Xi ^{\mathrm {Q}}_t(z)) - \frac{1}{2} \nabla \log \mathrm {Q}_t(z). \end{aligned}$$
    (78)

    Moreover,

    $$\begin{aligned} \int _{0}^T \int _{{\mathbb {R}}^d} |v_t|^2 \mathrm {d}\mathrm {Q}_t \mathrm {d}t <+\infty . \end{aligned}$$
    (79)
  2. (ii)

    The function \(t\mapsto {\mathcal {F}}(\mathrm {P}_t)\) is absolutely continuous and

    $$\begin{aligned} \forall 0\le & {} s\le t, \quad {\mathcal {F}}(\mathrm {Q}_t)-{\mathcal {F}}(\mathrm {Q}_s) = \int _s^t \int _{{\mathbb {R}}^d}\big (\nabla \log \mathrm {Q}_r \nonumber \\&+ 2 \nabla W*\mathrm {Q}_r \big )(z) \cdot v_r(z) \mathrm {Q}_r(\mathrm {d}z) \,\mathrm {d}r. \end{aligned}$$
    (80)

Proof

Proof of (i) To show that \((\mathrm {Q}_t)_{t\in [0,T]}\) is absolutely continuous it suffices to show that there exists a distributional solution of the continuity equation

$$\begin{aligned} \partial _t \mathrm {Q}_t + \nabla \cdot (w_t \mathrm {Q}_t)=0 \end{aligned}$$
(81)

with the property that

$$\begin{aligned} \int _{0}^T \left( \int _{{\mathbb {R}}^d} |w_t(z)|^2 \mathrm {Q}_t(z)\right) ^{1/2} \mathrm {d}t <+\infty . \end{aligned}$$
(82)

Let now \(\varphi \in {\mathcal {C}}^{\infty }_c(]0,T[ \times {\mathbb {R}}^d)\). Using Itô’s formula and taking expectation we obtain

$$\begin{aligned} \int _{0}^T \int _{{\mathbb {R}}^d}\Big (\nabla \varphi (t,z) \big ( -\nabla W *\mathrm {Q}_t(z) + \Xi ^{\mathrm {Q}}_t(z) \big ) + \frac{1}{2}\Delta \varphi (t,z) + \partial _t \varphi (t,z) \Big ) \, \mathrm {Q}_t(\mathrm {d}z)=0.\nonumber \\ \end{aligned}$$
(83)

Lemma 5.2 in the appendix grants that under the current assumptions \({\mathcal {H}}(\mathrm {Q}|\mathrm {R}^{\mu ^{\mathrm {in}}})<+\infty \), where \(\mathrm {R}^{\mu ^{\mathrm {in}}}\) is the Wiener measure started at \(\mu ^{\mathrm {in}}\). But then, using [25, Thm 3.10]Footnote 10 we obtain that \(\log \mathrm {Q}_t\) is an absolutely continuous function for almost every t and that \((t,z)\mapsto \nabla \log \mathrm {Q}_t(z)\) belongs to \(\mathrm {H}_{-1}((\mathrm {Q}_t)_{t\in [0,T]})\). Therefore we can use integration by parts in (83) to obtain

$$\begin{aligned} \forall t\in [0,T], \quad \frac{1}{2}\int _{{\mathbb {R}}^d} \nabla \varphi (t,z) \mathrm {Q}_t(\mathrm {d}z) = -\frac{1}{2}\int _{{\mathbb {R}}^d} \Delta \log \varphi (t,z) \cdot \nabla \log \mathrm {Q}_t(z) \,\mathrm {Q}_t(\mathrm {d}z) \end{aligned}$$

which gives, using the definition of the projection operator \(\Pi _{\mathrm {Q}_t}\), that the rhs of (78) solves the continuity equation in the sense of distributions. Next, we observe that (8) grants that \(\Pi _{\mathrm {Q}_t}(\Xi ^{\mathrm {Q}}(t,z))\in \mathrm {H}_{-1}((\mathrm {Q})_{t\in [0,T]})\). We have already shown that \(\nabla \log \mathrm {Q}_t \in \mathrm {H}_{-1}((\mathrm {Q})_{t\in [0,T]})\), and (54) used with \({\bar{b}}=-\nabla W *\mathrm {Q}_t(z)\) yields that \(-\nabla W*\mathrm {Q}_t(z)\in \mathrm {H}_{-1}((\mathrm {Q})_{t\in [0,T]})\). Thus \(v_t(z)\in \mathrm {H}_{-1}((\mathrm {Q})_{t\in [0,T]})\) as well, which gives (82) and (79). Finally, Remark 4.1 yields that \((v_t)_{t\in [0,T]}\) is indeed the tangent velocity field.

Proof of (ii) From point (i) we know that \(z\mapsto \nabla \log \mathrm {Q}_t(z)\) belongs to \(L^2_{\mathrm {Q}_t}\) for almost every t; this implies that \(\nabla \log \mathrm {Q}_t + 2 \nabla W *\mathrm {Q}_t \) belongs to the subdifferential of \({\mathcal {F}}\) at \(\mathrm {Q}_t\) for almost every t (see e.g. [1, Thm. 10.4.13]). The chain rule [1, sec. E, pp. 233–234] gives the desired result (80), provided its hypothesis are verified. We have to check that (a) \((\mathrm {Q}_t)_{t\in [0,T]}\) is an absolutely continuous curve and \({\mathcal {F}}(\mathrm {Q}_t)<+\infty \) for all \(t\in [0,T]\), (b) \({\mathcal {F}}(\cdot )\) is displacement \(\lambda \)-convex for some \(\lambda \in {\mathbb {R}}\), and (c) that

$$\begin{aligned} \int _{0}^T \left( \int _{{\mathbb {R}}^d}|v_t|^2 \mathrm {d}\mathrm {Q}_t\right) ^{1/2}\left( \int _{{\mathbb {R}}^d} \Big |\nabla \log \mathrm {Q}_t + 2 \nabla W*\mathrm {Q}_t\, \Big |^2 \mathrm {d}\mathrm {Q}_t \right) ^{1/2} \mathrm {d}t <+\infty . \end{aligned}$$

To wit, (a) follows from point (i) and the fact that \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))<+\infty \), and (b) is a consequence of displacement convexity of the entropy and (H1). Finally, (c) is granted by (79) and the fact that \(\nabla \log \mathrm {Q}_t(z)+2 \nabla W*\mathrm {Q}_t(z)\) belongs to \(\mathrm {H}_{-1}((\mathrm {Q}_t)_{t\in [0,T]})\) (see the proof of (i)). \(\square \)

4.3 Time reversal

For \(\mathrm {Q}\in {\mathcal {P}}(\Omega )\) the time reversal \({\hat{\mathrm {Q}}}\) is the law of the time reversed process \((X_{T-t})_{t\in [0,T]}\). In this section we derive an expression for \({\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}}))\) and use it to derive the bound (91) below, which plays a fundamental role in the proof of Theorem 4.1.

Proposition 4.1

Let \(\mathrm {Q}\in {\mathcal {P}}_{1}(\Omega )\) be Markov and such that \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))<+\infty \).

  1. (i)

    If \(\mathrm {Q}_0=\mu ^{\mathrm {in}}\), \(\mathrm {Q}_T=\mu ^{\mathrm {fin}}\) then \({\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}}))<+\infty \) as well and

    $$\begin{aligned} {\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}})) ={\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))+{\mathcal {F}}(\mu ^{\mathrm {in}})-{\mathcal {F}}(\mu ^{\mathrm {fin}}) \end{aligned}$$
    (84)
  2. (ii)

    If \(\mathrm {Q}_0=\mu ^{\mathrm {fin}}\), \(\mathrm {Q}_T=\mu ^{\mathrm {in}}\) then \({\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}}))<+\infty \) as well and

    $$\begin{aligned} {\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))={\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}})) +{\mathcal {F}}(\mu ^{\mathrm {in}})-{\mathcal {F}}(\mu ^{\mathrm {fin}}) \end{aligned}$$
    (85)

Proof

We only prove (i), (ii) being completely analogous. Recalling (see Lemma 5.2) that \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))<+\infty \) implies \({\mathcal {H}}(\mathrm {Q}|\mathrm {R}^{\mu ^{\mathrm {in}}})<+\infty \), we can use [25, Thm. 3.10, Eq. 3.9] to obtain that there exist a Borel measurable vector field \(\hat{b}_t(x)\) such that

$$\begin{aligned} X_t - \int _{0}^t \hat{b}_s(X_s) \mathrm {d}s \end{aligned}$$

is a Brownian motion under \({\hat{\mathrm {P}}}\) and that

$$\begin{aligned} {\hat{\mathrm {Q}}}-\text {a.s.} \quad \hat{b}_t(X_t) = -b_{T-t}(X_t) + \nabla \log \mathrm {Q}_{T-t}(X_t)\quad \forall t\in [0,T], \end{aligned}$$
(86)

where \(b_t(z)\) is the drift of \(\mathrm {Q}\), that, in view of Lemma 1.1 we write as \( -\nabla W *\mathrm {Q}_t + \Xi ^{\mathrm {Q}}_t(z)\). Thus, we deduce that under \({\hat{\mathrm {Q}}}\) we have that

$$\begin{aligned} X_t- \int _{0}^t -\nabla W*{\hat{\mathrm {Q}}}_s(X_s)+{\hat{\Xi }}^{\mathrm {Q}}_s(X_s)\mathrm {d}s \end{aligned}$$

is a Brownian motion, where

$$\begin{aligned} {\hat{\mathrm {Q}}}-\text {a.s.}\quad {\hat{\Xi }}^{\mathrm {Q}}_t(X_t)= & {} -\Xi ^{\mathrm {Q}}_{T-t}(X_t) \nonumber \\&+ \nabla \log \mathrm {Q}_{T-t}(X_t)+2\nabla W*\mathrm {Q}_{T-t}(X_t) \quad \forall t\in [0,T]. \end{aligned}$$
(87)

In the proof of Lemma 4.4, it was shown that \((\nabla \log \mathrm {Q}_{t})_{t\in [0,T]}\), \((\nabla W *\mathrm {Q}_{\cdot })_{t\in [0,T]}\) and \((\Xi ^{\mathrm {Q}}_t)_{t\in [0,T]}\) are all in \(\mathrm {H}_{-1}((\mathrm {Q}_t)_{t\in [0,T]})\). This implies that \(({\hat{\Xi }}^{\mathrm {Q}}_t)_{t\in [0,T]}\in \mathrm {H}_{-1}(({\hat{\mathrm {Q}}}_t)_{t\in [0,T]})\) as well. But then using \((ii)\Rightarrow (i)\) in Lemma 3.6 for the choice \({\bar{b}}(t,z)=-\nabla W*{\hat{\mathrm {Q}}}_t(z)\) we get that \({\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}}))<+\infty \) and

$$\begin{aligned} {\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}})) = \frac{1}{2}{\mathbb {E}}_{{\hat{\mathrm {Q}}}} \left[ \int _0^T|{\hat{\Xi }}^{\mathrm {Q}}_t(X_t)|^2 \mathrm {d}t \right] . \end{aligned}$$

Using (87) in the above equation we get

$$\begin{aligned}&{\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}}))\\&\quad = \frac{1}{2}{\mathbb {E}}_{{\hat{\mathrm {Q}}}} \left[ \int _0^T|\Xi ^{\mathrm {Q}}_{T-t}(X_t) -\big ( \nabla \log \mathrm {Q}_{T-t}(X_t)+2\nabla W*\mathrm {Q}_{T-t}(X_t)\big )|^2 \mathrm {d}t \right] \\&\quad = \frac{1}{2} {\mathbb {E}}_{\mathrm {Q}} \left[ \int _0^T|\Xi ^{\mathrm {Q}}_t(X_t)|^2 \mathrm {d}t\right] \\&\quad \quad + \frac{1}{2}{\mathbb {E}}_{\mathrm {Q}} \Big [ \int _0^T\big (\nabla \log \mathrm {Q}_{t}(X_t)+2\nabla W*\mathrm {Q}_{t}(X_t) \big )\cdot \\&\qquad \big (-2 \Xi ^{\mathrm {Q}}_{t}(X_t) + \nabla \log \mathrm {Q}_{t}(X_t)+ 2\nabla W*\mathrm {Q}_{t}(X_t)\big )\mathrm {d}t \Big ]\\&\quad {\mathop {=}\limits ^{(78)}} {\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q})) -{\mathbb {E}}_{\mathrm {Q}}\left[ \int _{0}^T \big (\nabla \log \mathrm {Q}_{t}(X_t)+2\nabla W*\mathrm {Q}_{t}(X_t) \big ) \cdot v_t(X_t)\mathrm {d}t\right] . \end{aligned}$$

The conclusion follows from point (ii) of Lemma 4.4. \(\square \)

A consequence of Proposition 4.1 is that optimality is preserved under time reversal.

Lemma 4.5

Let \(\mathrm {P}\) be an optimizer for (MFSP). Then \({\hat{\mathrm {P}}}\) optimizes

$$\begin{aligned} \inf \left\{ {\mathcal {H}}(\mathrm {Q}| \Gamma (\mathrm {Q}) ) \,:\, \mathrm {Q}\in {\mathcal {P}}_{1}(\Omega ),\, \mathrm {Q}_0=\mu ^{\mathrm {fin}},\, \mathrm {Q}_T=\mu ^{\mathrm {in}} \right\} . \end{aligned}$$
(88)

Proof

Let us observe that since (H2) makes no distinction between \(\mu ^{\mathrm {in}}\) and \(\mu ^{\mathrm {fin}}\), the problem (88) admits at least an optimal solution by Proposition 1.1. Applying Proposition 3.1 inverting the roles of \(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}\) we get that the optimizers of (88) are Markov. So it suffices to show that for any Markov \(\mathrm {Q}\) admissible for (88) we have \({\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))\ge {\mathcal {H}}({\hat{\mathrm {P}}}|\Gamma ({\hat{\mathrm {P}}}))\). Take any such \(\mathrm {Q}\). We have

$$\begin{aligned} {\mathcal {H}}(\mathrm {Q}|\Gamma (\mathrm {Q}))&{\mathop {=}\limits ^{\text {Prop.}~4.1 (ii)}} {\mathcal {H}}({\hat{\mathrm {Q}}}|\Gamma ({\hat{\mathrm {Q}}})) + {\mathcal {F}}(\mu ^{\mathrm {in}})-{\mathcal {F}}(\mu ^{\mathrm {fin}})\\&{\mathop {\ge }\limits ^{\text {Opt. of } \mathrm {P}}} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P})) + {\mathcal {F}}(\mu ^{\mathrm {in}})-{\mathcal {F}}(\mu ^{\mathrm {fin}})\\&{\mathop {=}\limits ^{\text {Prop.}~4.1 (i)}}{\mathcal {H}}({\hat{\mathrm {P}}}|\Gamma ({\hat{\mathrm {P}}})). \end{aligned}$$

\(\square \)

4.4 Functional inequalities: proofs and the behaviour of \({\mathcal {F}}\)

The goal of this section is to prove Theorem 1.6 as well as the Talagrand and HWI inequalities. The latter are colloraries of the following new result concerning the behaviour of \({\mathcal {F}}\) along bridges:

Theorem 4.1

Assume (H1)–(H4) and let \(T>0\) be fixed. If \(\mathrm {P}\) is an optimizer for (MFSP), then for all \(t\in [0,T]\) we have

$$\begin{aligned} {\mathcal {F}}(\mathrm {P}_t)\le & {} \frac{\exp (2\kappa (T-t))-1}{\exp (2\kappa T)-1}{\mathcal {F}}(\mu ^{\mathrm {in}}) + \frac{\exp (2\kappa T)-\exp (2\kappa (T-t))}{\exp (2\kappa T)-1} {\mathcal {F}}(\mu ^{\mathrm {fin}})\nonumber \\&- \frac{ (\exp (2\kappa (T-t) )-1) (\exp (2\kappa t )-1 )}{\exp (2\kappa T)-1} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}). \end{aligned}$$
(89)

This bound generalizes to the mean field setup the results of [17], and may be seen as a rigorous version of some of the heuristic arguments put forward in [28] and [34], upon slightly modifying the definition of \({\mathscr {C}}_T\).

4.4.1 Proof of Theorem 4.1

Using a time reversal argument, we prove the bound (91) which is the key ingredient of the proof of Theorem 4.1 together with the bound for the correction term (70).

The backward corrector \({\hat{\Psi }}\) is obtained by the same argument used in Proposition 4.1, replacing (MFSP) with (88) to obtain that there exist a Borel measurable vector field \({\hat{\Psi }}_t(z)\in \mathrm {H}_{-1}(({\hat{\mathrm {P}}}_t)_{t\in [0,T]})\) such that

$$\begin{aligned} X_t -\int _{0}^t\left( -\nabla W*{\hat{\mathrm {P}}}_{s}(X_s) +{\hat{\Psi }}_s(X_s)\right) \mathrm {d}s \end{aligned}$$

is a Brownian motion under \({\hat{\mathrm {P}}}\). Moreover, the following relation holds

$$\begin{aligned}&{\hat{\mathrm {P}}}-\text {a.s.}\quad {\hat{\Psi }}_t(X_t)= -\Psi _{T-t}(X_t) + \nabla \log \mathrm {P}_{T-t}(X_t)\nonumber \\&\quad +2\nabla W*\mathrm {P}_{T-t}(X_t) \quad \forall t\in [0,T]. \end{aligned}$$
(90)

Lemma 4.6

Assume (H3), (H4) and let \(\mathrm {P}\) be an optimizer for (MFSP). Then

$$\begin{aligned}&{\mathcal {F}}(\mathrm {P}_r)+ \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{r}^{T} |\Psi _s(X_s)|^2 \mathrm {d}s \right] \le \frac{\exp (2\kappa (T-r))-1}{\exp (2\kappa T)-1} {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) \nonumber \\&\quad + \frac{\exp (2\kappa (T-r))-1}{\exp (2\kappa T)-1}{\mathcal {F}}(\mu ^{\mathrm {in}}) + \frac{\exp (2\kappa T)-\exp (2\kappa (T-r))}{\exp (2\kappa T)-1} {\mathcal {F}}(\mu ^{\mathrm {fin}}) \end{aligned}$$
(91)

and

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ |{\hat{\Psi }}_{T-r}(X_r)|^2 \right] \le \frac{2\kappa \, {\mathscr {C}}_T(\mu ^{\mathrm {fin}},\mu ^{\mathrm {in}})}{\exp (2\kappa r)-1} \end{aligned}$$
(92)

hold for all \(r\in (0,T)\).

Proof

Using (78) we can rewrite the above equation (90) as

$$\begin{aligned} {\hat{\mathrm {P}}}-\text {a.s.}\quad {\hat{\Psi }}_t(X_t)= \Psi _{T-t}(X_t) -2v_{T-t}(X_t) \quad \forall t\in [0,T] \end{aligned}$$
(93)

From Proposition 4.5 we also know that \({\hat{\mathrm {P}}}\) is optimal for (88) and hence that \({\mathcal {H}}({\hat{\mathrm {P}}}|\Gamma ({\hat{\mathrm {P}}}))={\mathscr {C}}_T(\mu ^{\mathrm {fin}},\mu ^{\mathrm {in}})\). Therefore, by inverting again the roles of \(\mu ^{\mathrm {in}}\) and \(\mu ^{\mathrm {fin}}\), we can use Lemma 4.3 for the problem (88) setting \(t=T-r\) to derive that

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{{\hat{\mathrm {P}}}}\left[ \int _{0}^{T-r} |{\hat{\Psi }}_s(X_s)|^2\mathrm {d}s \right] \le \frac{\exp (2\kappa (T-r))-1}{\exp (2\kappa T)-1} {\mathcal {H}}({\hat{\mathrm {P}}}|\Gamma ({\hat{\mathrm {P}}})). \end{aligned}$$
(94)

Thanks to (93) we can write

$$\begin{aligned}&\frac{1}{2}{\mathbb {E}}_{{\hat{\mathrm {P}}}}\left[ \int _{0}^{T-r}|{\hat{\Psi }}_s(X_s)|^2\mathrm {d}s \right] = \frac{1}{2} {\mathbb {E}}_{\mathrm {P}} \left[ \int _r^T|\Psi _s(X_s)|^2 \mathrm {d}s\right] \nonumber \\&\qquad -{\mathbb {E}}_{\mathrm {P}} \left[ \int _r^T\big (2\Psi _s(X_s)-2v_s(X_s)) v_s(X_s) \mathrm {d}s \right] \nonumber \\&\quad {\mathop {=}\limits ^{(78)+\Psi \in \mathrm {H}_{-1}}} \frac{1}{2} {\mathbb {E}}_{\mathrm {P}} \left[ \int _r^T|\Psi _s(X_s)|^2 \mathrm {d}s\right] \nonumber \\&\qquad -{\mathbb {E}}_{\mathrm {P}}\left[ \int _{r}^T \big (\nabla \log \mathrm {P}_{s}(X_s)+2\nabla W*\mathrm {P}_{s}(X_s) \big ) \cdot v_s(X_s)\,\mathrm {d}s\right] \nonumber \\&\quad {\mathop {=}\limits ^{(80)}}\frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _{r}^{T} |\Psi _s(X_s)|^2\mathrm {d}s\right] +{\mathcal {F}}(\mathrm {P}_r)-{\mathcal {F}}(\mu ^{\mathrm {fin}}). \end{aligned}$$
(95)

The bound (91) follows by plugging in (95) into (94) using the above equation, (84) and recalling that \({\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))={\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})\). The proof of (92) goes along the same lines: Since \({\hat{\mathrm {P}}}\) is optimal for (88) we also get from Lemma 4.3, and in particular from (71) for the choice \(t=T-r\) that

$$\begin{aligned} \frac{1}{2}{\mathbb {E}}_{{\hat{\mathrm {P}}}}\left[ |{\hat{\Psi }}_{T-r}(X_{T-r})|^2 \right] \le \frac{2\kappa \, {\mathscr {C}}_T(\mu ^{\mathrm {fin}},\mu ^{\mathrm {in}})}{\exp (2\kappa r)-1}. \end{aligned}$$

\(\square \)

Now the proof of Theorem 4.1 and its corollaries in the introduction is an easy task, given all the preparatory work.

Proof of Theorem 4.1

It amounts to add (70) and (91) with the choice \(r=t\), and use the relation

$$\begin{aligned} {\mathcal {H}}(\mathrm {P}|\Gamma (\mathrm {P}))=\frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ \int _0^T|\Psi _t(X_t)|^2\mathrm {d}t\right] = {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}). \end{aligned}$$

\(\square \)

Proof of Corollary 1.3

It follows from Theorem 4.1 (Eq. (89)), observing that \({\mathcal {F}}(\mathrm {P}_t)\ge 0 \). \(\square \)

Proof of Corollary 1.4

Combining (90), (93), (31) we get that

$$\begin{aligned} \int _{{\mathbb {R}}^d} |v_t|^2(x) \mathrm {P}_t(\mathrm {d}x) = - {\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu _{\infty }) + \frac{1}{4} {\mathcal {I}}_{{\mathcal {F}}}(\mathrm {P}_t). \end{aligned}$$

Using the above relation, Cauchy Schwartz inequality and the continuity of \({\mathcal {I}}_{{\mathcal {F}}}(\mathrm {P}_t)\) in a neighborhood of 0, (80) we get that

$$\begin{aligned} \liminf _{t\rightarrow 0} \frac{1}{t}({\mathcal {F}}(\mathrm {P}_t)-{\mathcal {F}}(\mathrm {P}_0) ) \ge - \left( {\mathcal {I}}_{{\mathcal {F}}}(\mu ^{\mathrm {in}}) \big ( \frac{1}{4}{\mathcal {I}}_{{\mathcal {F}}}(\mu ^{\mathrm {in}})- {\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu _{\infty }) \big ) \right) ^{1/2}.\nonumber \\ \end{aligned}$$
(96)

Consider now the bound (89). Observing that \({\mathcal {F}}(\mu _{\infty })=0\), subtracting \({\mathcal {F}}(\mu ^{\mathrm {in}})\) on both sides, dividing by t, letting \(t\rightarrow 0\), using (96) and finally rearranging the resulting terms we get (28). \(\square \)

4.4.2 Proof of Theorem 1.6

We prove here Theorem 1.6 of the introduction. In the proof we will write

$$\begin{aligned} \int _{{\mathbb {R}}^d} \nabla ^2 W(\hat{X}_t-y)\cdot ({\hat{\Psi }}_t(\hat{X}_t) - {\hat{\Psi }}_t(y) ) \, {\hat{\mathrm {P}}}_t(\mathrm {d}y), \end{aligned}$$

instead of

$$\begin{aligned} \tilde{{\mathbb {E}}}_{\tilde{\mathrm {P}}} \left[ \nabla ^2 W(X_s-\tilde{X}_s)\cdot (\Psi _s(X_s) - \Psi _s(\tilde{X}_s) )\right] , \end{aligned}$$

which is used in the rest of the article. This is done in order to better deal with time reversal.

Proof of Theorem 1.6

Let \(M_t\) be the martingale defined at (14). Since \({\hat{\mathrm {P}}}\) is optimal for (88), from Proposition 3.2 we get that

$$\begin{aligned} \hat{M}_t={\hat{\Psi }}_t(X_t)- \int _{0}^t \int _{{\mathbb {R}}^d} \nabla ^2 W(\hat{X}_s-y)\cdot ({\hat{\Psi }}_s(X_s) - {\hat{\Psi }}_s(y) ) \, {\hat{\mathrm {P}}}_s(\mathrm {d}y) \, \mathrm {d}s \end{aligned}$$

is an \(L^2\)-martingale on [0, T[ under \({\hat{\mathrm {P}}}\). We define the stochastic processes

$$\begin{aligned} A_t := \int _{{\mathbb {R}}^d} \nabla ^2 W(X_s-y)\cdot (\Psi _s(X_s) - \Psi _s(y) ) \, \mathrm {P}_s(\mathrm {d}y) \end{aligned}$$

and

$$\begin{aligned} \hat{A}_t:=\int _{{\mathbb {R}}^d} \nabla ^2 W(\hat{X}_s-y)\cdot ({\hat{\Psi }}_s(\hat{X}_s) - {\hat{\Psi }}_s(y) ) \, {\hat{\mathrm {P}}}_s(\mathrm {d}y). \end{aligned}$$

We have, using the Markovianity of both \(\mathrm {P}\) and \({{\hat{\mathrm {P}}}}\), that

$$\begin{aligned}&{\mathbb {E}}_{\mathrm {P}}\left[ \Psi _t(X_t)\cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t}) \right] \\&\quad = {\mathbb {E}}_{\mathrm {P}}\left[ (M_t +\int _{0}^t A_s \mathrm {d}s) \cdot (\hat{M}_{T-t} +\int _{0}^{T-t} \hat{A}_{s} \mathrm {d}s ) \right] \\&\quad = {\mathbb {E}}_{\mathrm {P}}\left[ \left( {\mathbb {E}}_{\mathrm {P}}[M_T | X_{[0,t]}] +\int _{0}^t A_s \mathrm {d}s\right) \cdot \left( {\mathbb {E}}_{\mathrm {P}}[\hat{M}_{T}|\hat{X}_{[0,T-t]} ] +\int _{0}^{T-t} \hat{A}_{s} \mathrm {d}s \right) \right] \\&\quad = {\mathbb {E}}_{\mathrm {P}}\left[ {\mathbb {E}}_{\mathrm {P}}[\Psi _T(X_T) -\int _{t}^TA_s\mathrm {d}s\, | X_{[0,t]}] \cdot {\mathbb {E}}_{\mathrm {P}}[{\hat{\Psi }}_T(\hat{X}_T) -\int _{T-t}^{T} \hat{A}_{s} \mathrm {d}s\, |\hat{X}_{[0,T-t]} ] \right] \\&\quad = {\mathbb {E}}_{\mathrm {P}}\left[ {\mathbb {E}}_{\mathrm {P}}[\Psi _T(X_T) -\int _{t}^TA_s\mathrm {d}s\, | X_t] \cdot {\mathbb {E}}_{\mathrm {P}}[{\hat{\Psi }}_T(\hat{X}_T) -\int _{T-t}^{T} \hat{A}_{s} \mathrm {d}s\, |\hat{X}_t ]\right] \\&\quad = {\mathbb {E}}_{\mathrm {P}}\left[ \left( \Psi _T(X_T)-\int _{t}^T A_s\mathrm {d}s\right) \cdot \left( {\hat{\Psi }}_T(\hat{X}_T) -\int _{T-t}^{T} \hat{A}_{s} \mathrm {d}s\,\right) \right] . \end{aligned}$$

Therefore,

$$\begin{aligned}&\frac{\mathrm {d}}{\mathrm {d}t}{\mathbb {E}}_{\mathrm {P}}\left[ \Psi _t(X_t)\cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t}) \right] \\&\quad = {\mathbb {E}}_{\mathrm {P}}\left[ -A_t \cdot ( {\hat{\Psi }}_T(\hat{X}_T)-\int _{T-t}^{T} \hat{A}_{s}) + \hat{A}_{T-t}\cdot (\Psi _T(X_T)-\int _{t}^{T} A_s \mathrm {d}s)\right] \\&\quad ={\mathbb {E}}_{\mathrm {P}}\left[ -A_t \cdot ( \hat{M}_T-\hat{M}_{T-t} + {\hat{\Psi }}_{T-t}(\hat{X}_{T-t})) + \hat{A}_{T-t}\cdot (M_T-M_t+ \Psi _t(X_t) ) \right] . \end{aligned}$$

Taking conditional expectation w.r.t. \(\sigma (X_{[0,t]})\) and using that both \(A_t\) and \(\hat{A}_{T-t}\) are \(X_{[0,t]}\)-measurable we get that the above expression equals

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}}\left[ -A_t \cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t}) + \hat{A}_{T-t}\cdot \Psi _t(X_t) \right] . \end{aligned}$$

Using the fact that W is symmetric and the definition of \(A_t,\hat{A}_{T-t}\), one easily obtains that the latter expression is worth 0. Indeed it holds that

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}}\left[ A_t \cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t})\right]= & {} {\mathbb {E}}_{\mathrm {P}}\left[ \hat{A}_{T-t}\cdot \Psi _t(X_t) \right] \\= & {} \int _{{\mathbb {R}}^d\times {\mathbb {R}}^d} ({\hat{\Psi }}_{T-t}(x)-{\hat{\Psi }}_{T-t}(y))\cdot \nabla ^2W(x-y)\cdot (\Psi _{t}(x)\\&-\Psi _{t}(y)) \mathrm {P}_t(\mathrm {d}x)\mathrm {P}_t(\mathrm {d}y). \end{aligned}$$

The proof that the function (31) is constant on (0, T) is now concluded. In order to establish (32) we set \(t=T/2\) in (31) and Cauchy Schwartz to get that

$$\begin{aligned} |{\mathscr {E}}_{\mathrm {P}}(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}})| \le \left( {\mathbb {E}}_{\mathrm {P}}[|\Psi _{T/2}(X_{T/2})|^2]{\mathbb {E}}_{\mathrm {P}}[|{\hat{\Psi }}_{T/2} (\hat{X}_{T/2})|^2]\right) ^{1/2}. \end{aligned}$$

The desired conclusion follows from (71) and (92). \(\square \)

4.5 Convergence to equilibrium: proofs

4.5.1 Proof of Theorem 1.4

Proof of Theorem 1.4

Lemma 4.4 provides with

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} {\mathcal {F}}(\mathrm {P}_t)&{\mathop {=}\limits ^{(80)}} {\mathbb {E}}_{\mathrm {P}}\left[ \left( \nabla \log \mathrm {P}_t(X_t) + 2\nabla W*\mathrm {P}_t(X_t)\right) \cdot v_t(X_t)\right] \\&{\mathop {=}\limits ^{(90)+(93)}} \frac{1}{2} {\mathbb {E}}_{\mathrm {P}}\left[ \left( \Psi _t(X_t)+{\hat{\Psi }}_{T-t}(\hat{X}_{T-t}) \right) \cdot \left( \Psi _t(X_t)-{\hat{\Psi }}_{T-t}(\hat{X}_{T-t})\right) \right] \\&=\frac{1}{2}{\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2-|{\hat{\Psi }}_{T-t}(\hat{X}_{T-t})|^2\right] . \end{aligned}$$

Reasoning as in the proof of Lemma 4.3 we get that both \({\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2 \right] \) and \({\mathbb {E}}_{\mathrm {P}}\left[ |{\hat{\Psi }}_{T-t}(\hat{X}_{T-t})|^2 \right] \) are differentiable as functions of t in the open interval (0, T). Moreover

$$\begin{aligned}&\frac{1}{2}\frac{\mathrm {d}}{\mathrm {d}t}{\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2-|{\hat{\Psi }}_{T-t}(\hat{X}_{T-t})|^2\right] \\&\quad {\mathop {\ge }\limits ^{(72)+(75)}} \kappa {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2+|{\hat{\Psi }}_{T-t}(\hat{X}_{T-t})|^2\right] \\&\quad =\kappa \, {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)+{\hat{\Psi }}_{T-t}(\hat{X}_{T-t})|^2 -2\Psi _t(X_t)\cdot {\hat{\Psi }}_{T-t}(\hat{X}_{T-t})\right] \\&\quad {\mathop {=}\limits ^{(87)}}\kappa \,{\mathbb {E}}_{\mathrm {P}}\left[ |\nabla \log \mathrm {P}_t(X_t) + 2\nabla W*\mathrm {P}_t(X_t)|^2 \right] -2\kappa \, {\mathscr {E}}_{\mathrm {P}}(\mu ^{in},\mu ^{fin}). \end{aligned}$$

The \(\kappa \)-convexity of W and the fact that the center of mass \({\mathbb {E}}_{\mathrm {P}}[X_t]\) is constant (see Lemma 4.2) allow to use the logarithmic Sobolev inequality [12, (ii), Thm 2.2] to obtainFootnote 11

$$\begin{aligned} \kappa {\mathbb {E}}_{\mathrm {P}}\left[ |\nabla \log \mathrm {P}_t(X_t) + 2\nabla W*\mathrm {P}_t(X_t)|^2 \right] \ge 4\kappa ^2 {\mathcal {F}}(\mathrm {P}_t). \end{aligned}$$

Thus we have proven that for almost every \(t\in [0,T]\)

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t^2} {\mathcal {F}}(\mathrm {P}_t)\ge 4\kappa ^2{\mathcal {F}}(\mathrm {P}_t) -2\kappa {\mathscr {E}}_{\mathrm {P}}(\mu ^{in},\mu ^{fin}), \end{aligned}$$

from which (22) follows by integrating this differential inequality (see Lemma 5.6). Setting \(t=\theta T\) in (22) and using (32) to bound the conserved quantity gives (23) after some standard calculations. \(\square \)

4.5.2 Proof of Theorem 1.5

Proof of Theorem 1.5

Let \(\mathrm {P}\) be optimal for (MFSP) and \(\Psi \) be given by Proposition 3.1. Then if we define

$$\begin{aligned} B_t:=X_t - \int _0^t \nabla W *\mathrm {P}_s(X_s) + \Psi _s(X_s)\mathrm {d}s \end{aligned}$$

the process \((B_t)_{t\in [0,T]}\) is a Brownian motion under \(\mathrm {P}\). Since the McKean Vlasov SDE admits a unique strong solution, there exists a map \({\mathbb {Y}}:\Omega \longrightarrow \Omega \) such that \({\mathbb {Y}}\circ B_{\cdot }:=Y\) satisfies \(Y_0=X_0\,(\mathrm {P}-\text {a.s.}) \) and

$$\begin{aligned} \mathrm {P}-\text {a.s.} \quad Y_t = Y_0 - \int _{0}^t \nabla W *\mathrm {P}^{\text {\tiny {MKV}}}_s(Y_s) \mathrm {d}s + B_t. \end{aligned}$$

Define now \(\delta (t)={\mathbb {E}}_{\mathrm {P}}[|X_t-Y_t|^2]\). Using Itô’s formula we get that \(\delta (t)\) is differentiable with derivative

$$\begin{aligned} \delta '(t)= & {} -2{\mathbb {E}}_{\mathrm {P}}[(X_t-Y_t)\cdot (\nabla W*\mathrm {P}_t(X_t)-\nabla W*\mathrm {P}^{\text {\tiny {MKV}}}_t(Y_t) )] \\&+ 2{\mathbb {E}}_{\mathrm {P}}[(X_t-Y_t)\cdot \Psi _t(X_t) ] \end{aligned}$$

The same arguments as in Lemma 4.3 give

$$\begin{aligned} 2{\mathbb {E}}_{\mathrm {P}}[(X_t-Y_t)\cdot (\nabla W*\mathrm {P}_t(X_t)-\nabla W*\mathrm {P}^{\text {\tiny {MKV}}}_t(Y_t) )] \ge 2\kappa {\mathbb {E}}_{\mathrm {P}}[|X_t-Y_t|^2] \ge 0. \end{aligned}$$

Moreover, by Cauchy Schwartz:

$$\begin{aligned} {\mathbb {E}}_{\mathrm {P}}[(X_t-Y_t)\cdot \Psi _t(X_t)] \le {\mathbb {E}}_{\mathrm {P}}\left[ |X_t-Y_t|^2\right] ^{1/2}{\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2\right] ^{1/2}. \end{aligned}$$

Therefore

$$\begin{aligned} \delta '(t) \le 2 \delta (t)^{1/2}{\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2\right] ^{1/2} \end{aligned}$$

which gives

$$\begin{aligned} (\sqrt{\delta })'(t) \le {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _t(X_t)|^2\right] ^{1/2}. \end{aligned}$$

Integrating the differential inequality and using that \(\delta (0)=0\):

$$\begin{aligned} \sqrt{\delta }(t) \le \int _{0}^t {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _s(X_s)|^2\right] ^{1/2} \mathrm {d}s \le t^{1/2} \Big ( \int _{0}^t {\mathbb {E}}_{\mathrm {P}}\left[ |\Psi _s(X_s)|^2\right] \mathrm {d}s\Big )^{1/2} \\ {\mathop {\le }\limits ^{70}} t^{1/2} \Big ( 2 \frac{\exp (2\kappa t)-1}{\exp (2\kappa T)-1} \, {\mathscr {C}}_T(\mu ^{\mathrm {in}},\mu ^{\mathrm {fin}}) \Big )^{1/2}. \end{aligned}$$

The conclusion follows from (25) and the observation that\({\mathcal {W}}_2^2(\mathrm {P}_t,\mathrm {P}^{\text {\tiny {MKV}}}_t) \le \delta (t)\). \(\square \)