1 Introduction

Let \(\mu \) be a probability measure on \({\mathbb {R}}^d\) and \(X \sim \mu \). Denote by \(\mathrm {h}(\mu )\), the differential entropy of \(\mu \) which is defined to be

$$\begin{aligned} \mathrm {h}(\mu ): = \mathrm {h}(X) = -\int \limits _{{\mathbb {R}}^d}\ln \left( \frac{d\mu }{dx}\right) d\mu . \end{aligned}$$

One of the fundamental results of information theory is the celebrated Shannon–Stam inequality which asserts that for independent vectors X, Y and \(\lambda \in (0,1)\)

$$\begin{aligned} \mathrm {h}\left( \sqrt{\lambda }X + \sqrt{1-\lambda }Y\right) \ge \lambda \mathrm {h}(X) + (1-\lambda )\mathrm {h}(Y). \end{aligned}$$
(1)

We remark that Stam [24] actually proved the equivalent statement

$$\begin{aligned} e^{\frac{2\mathrm {h}(X+Y)}{d}} \ge e^{\frac{2\mathrm {h}(X)}{d}} + e^{\frac{2\mathrm {h}(Y)}{d}}, \end{aligned}$$
(2)

first observed by Shannon in [23], and known today as the entropy power inequality. To state yet another equivalent form of the inequality, for any positive-definite matrix, \(\varSigma \), we set \(\gamma _\varSigma \) as the centered Gaussian measure on \({\mathbb {R}}^d\) with density

$$\begin{aligned} \frac{d\gamma _{\varSigma }(x)}{dx} = \frac{e^{-\frac{\langle x, \varSigma ^{-1}x\rangle }{2}}}{\sqrt{\det (2\pi \varSigma )}}. \end{aligned}$$

For the case where the covariance matrix is the identity, \(\mathrm {I}_d\), we will also write \(\gamma := \gamma _{\mathrm {I}_d}\). If \(Y \sim \nu \) we set the relative entropy of X with respect to Y as

$$\begin{aligned} \mathrm {D}(\mu ||\nu ): = \mathrm {D}(X||Y) = \int \limits _{{\mathbb {R}}^d}\ln \left( \frac{d\mu }{d\nu }\right) d\mu . \end{aligned}$$

For \(G \sim \gamma \), the differential entropy is related to the relative entropy by

$$\begin{aligned} \mathrm {D}(X ||G)&= -\mathrm {h}(X) - \frac{1}{2}{\mathbb {E}}\left[ \left\Vert X\right\Vert _2^2\right] +\frac{d}{2}\ln (2\pi ). \end{aligned}$$

Thus, when X and Y are independent and centered the statement

$$\begin{aligned} \mathrm {D}\left( \sqrt{\lambda }X + \sqrt{1-\lambda }Y\big |\big |G\right) \le \lambda \mathrm {D}(X||G) + (1 -\lambda )\mathrm {D}(Y||G), \end{aligned}$$
(3)

is equivalent to (1). Shannon noted that in the case that X and Y are Gaussians with proportional covariance matrices, both sides of (2) are equal. Later, in [24] it was shown that this is actually a necessary condition for the equality case. We define the deficit in (3) as

$$\begin{aligned} \delta _{EPI, \lambda }(\mu ,\nu ):= & {} \delta _{EPI, \lambda }(X,Y)= \Bigl (\lambda \mathrm {D}(X||G) + (1 -\lambda )\mathrm {D}(Y||G)\Bigr )\\&- \mathrm {D}\left( \sqrt{\lambda }X + \sqrt{1-\lambda }Y\big |\big |G\right) , \end{aligned}$$

and are led to the question: what can be said aboutXandYwhen\(\delta _{EPI, \lambda }(X,Y)\)is small? One might expect that, in light of the equality cases, a small deficit in (3) should imply that X and Y are both close, in some sense, to a Gaussian. A recent line of works has focused on an attempt to make this intuition precise (see e.g., [6, 26]), which is also our main goal in the present work. In particular, we give the first stability estimate in terms of relative entropy. A good starting point is the work of Courtade et al. [6] which considers stability in terms of the Wasserstein distance (also known as quadratic transportation). The Wasserstein distance is defined by

$$\begin{aligned} \mathcal {W}_2(\mu ,\nu ) =\inf \limits _{\pi }\sqrt{\int \limits _{{\mathbb {R}}^{2d}}\left\Vert x-y\right\Vert _2^2d\pi (x,y)}, \end{aligned}$$

where the infimum is taken over all couplings \(\pi \) whose marginal laws are \(\mu \) and \(\nu \). A crucial observation made in their work is that without further assumptions on the measures \(\mu \) and \(\nu \), one should not expect meaningful stability results to hold. Indeed, for any \(\lambda \in (0,1)\) they show that there exists a family of measures \(\{\mu _\varepsilon \}_{\varepsilon > 0}\) such that \(\delta _{EPI, \lambda }(\mu _\varepsilon ,\mu _\varepsilon ) < \varepsilon \) and such that for any Gaussian measure \(\gamma _\varSigma \), \(\mathcal {W}_2(\mu _\varepsilon , \gamma _\varSigma ) \ge \frac{1}{3}\). Moreover, one may take \(\mu _\varepsilon \) to be a mixture of Gaussians. Thus, in order to derive quantitative bounds it is necessary to consider a more restricted class of measures. We focus on the class of log-concave measures which, as our method demonstrates, turns out to be natural in this context.

1.1 Our contribution

A measure is called log-concave if it is supported on some subspace of \({\mathbb {R}}^d\) and, relative to the Lebesgue measure of that subspace, it has a density f for which

$$\begin{aligned} -\nabla ^2 \ln (f(x)) \succeq 0 \text { for all } x, \end{aligned}$$

where \(\nabla ^2\) denotes the Hessian matrix, and we consider the inequality in the sense of positive definite matrices. Our first result will rely on a slightly stronger condition known as uniform log-concavity. If there exists \(\xi > 0\) such that

$$\begin{aligned} -\nabla ^2 \ln (f(x)) \succeq {\xi }\mathrm {I}_d \text { for all } x, \end{aligned}$$

then we say that the measure is \(\xi \)-uniformly log-concave.

Theorem 1

Let X and Y be 1-uniformly log-concave centered vectors, and denote by \(\sigma ^2_X,\sigma ^2_Y\) the respective minimal eigenvalues of their covariance matrices. Then there exist Gaussian vectors \(G_X\) and \(G_Y\) such that for any \(\lambda \in (0,1)\),

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y)&\ge \frac{\lambda (1-\lambda )}{2}\left( \sigma _X^4\mathrm {D}\left( X||G_X\right) + \sigma _Y^4\mathrm {D}\left( Y||G_Y\right) \right. \\&\quad \left. + \frac{\sigma _X^4}{2}\mathrm {D}\left( G_X||G_Y\right) + \frac{\sigma _Y^4}{2}\mathrm {D}\left( G_Y||G_X\right) \right) . \end{aligned}$$

To compare this with the main result of [6] we recall the transportation-entropy inequality due to Talagrand [25] which states that

$$\begin{aligned} \mathcal {W}_2^2(X,G) \le 2\mathrm {D}(X||G). \end{aligned}$$

As a conclusion we get

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y)&\ge C_{\sigma _X,\sigma _Y} \frac{\lambda (1-\lambda )}{2}\left( \mathcal {W}_2^2\left( X,G_X\right) + \mathcal {W}_2^2\left( Y,G_Y\right) \right. \\&\quad \left. + \mathcal {W}_2^2\left( G_X,G_Y\right) \right) , \end{aligned}$$

where \(C_{\sigma _X,\sigma _Y}\) depends only on \(\sigma _X\) and \(\sigma _Y\). Up to this constant, this is precisely the main result of [6]. In fact, our method can reproduce their exact result, which we present as a warm up in the next section. We remark that as the underlying inequality is of information-theoretic nature, it is natural to expect that stability estimates are expressed in terms of relative entropy.

A random vector is isotropic if it is centered and its covariance matrix is the identity. By a re-scaling argument the above theorem can be restated for uniform log-concave isotropic random vectors.

Corollary 1

Let X and Y be \(\xi \)-uniformly log-concave and isotropic random vectors, then there exist Gaussian vectors \(G_X\) and \(G_Y\) such that for any \(\lambda \in (0,1)\)

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y)&\ge \frac{\lambda (1-\lambda )}{2}\xi ^2\left( \mathrm {D}\left( X||G_X\right) + \mathrm {D}\left( Y||G_Y\right) + \frac{1}{2}\mathrm {D}\left( G_X||G_Y\right) \right. \\&\quad \left. + \frac{1}{2}\mathrm {D}\left( G_Y||G_X\right) \right) . \end{aligned}$$

In our estimate for general log-concave vectors, the dependence on the parameter \(\xi \) will be replaced by the spectral gap of the measures. We say that a random vector X satisfies a Poincaré inequality if there exists a constant \(C>0\) such that

$$\begin{aligned} {\mathbb {E}}\left[ \mathrm {Var}(\psi (X))\right] \le C {\mathbb {E}}\left[ \left\Vert \nabla \psi (X)\right\Vert _2^2\right] , \text { for all test functions } \psi . \end{aligned}$$

We define \(C_p(X)\) to be the smallest number such that the above equation holds with \(C=C_p(X)\), and refer to this quantity as the Poincaré constant of X. The inverse quantity, \(C_p(X)^{-1}\) is referred to as the spectral gap of X.

Theorem 2

Let X and Y be centered log-concave vectors with \(\sigma ^2_X\), \(\sigma _Y^2\) denoting the minimal eigenvalues of their covariance matrices. Assume that \(\mathrm {Cov}(X) + \mathrm {Cov}(Y) =2\mathrm {I}_d\) and set \(\max \left( \frac{\mathrm {C_p}(X)}{\sigma _X^2},\frac{\mathrm {C_p}(Y)}{\sigma ^2_Y}\right) = \mathrm {C_p}\). Then, if G denotes the standard Gaussian, for every \(\lambda \in (0,1)\)

$$\begin{aligned}&\delta _{EPI, \lambda }(X,Y) \ge K\lambda (1-\lambda )\left( \frac{\min (\sigma ^2_Y,\sigma _X^2)}{\mathrm {C_p}}\right) ^3\left( \mathrm {D}\left( X||G\right) + \mathrm {D}\left( Y||G\right) \right) , \end{aligned}$$

where \(K >0\) is a numerical constant, which can be made explicit.

Remark 1

For \(\xi \)-uniformly log-concave vectors, we have the relation, \(\mathrm {C_p}(X) \le \frac{1}{\xi }\) (this is a consequence of the Brascamp-Lieb inequality [3], for instance). Thus, considering Corollary 1, one might have expected that the term \(\mathrm {C^3_p}\) could have been replaced by \(\mathrm {C^2_p}\) in Theorem 2. We do not know if either result is tight.

Remark 2

Bounding the Poincaré constant of an isotropic log-concave measure is the object of the long standing Kannan-Lovász-Simonovits (KLS) conjecture (see [15, 17] for more information). The conjecture asserts that there exists a constant \(K >0\), independent of the dimension, such that for any isotropic log-concave vector X, \(\mathrm {C_p}(X) \le K\). The best known bound is due to Lee and Vempala which showed in [18] that if X is a a d-dimensional log-concave vector, \(\mathrm {C_p}(X) = O \left( \sqrt{d}\right) .\)

Concerning the assumptions of Theorem 2; note that as the EPI is invariant to linear transformation, there is no loss in generality in assuming \(\mathrm {Cov}(X) + \mathrm {Cov}(Y) = 2\mathrm {I}_d\). Remark that \(\mathrm {C_p}(X)\) is, approximately, proportional to the maximal eigenvalue of \(\mathrm {Cov}(X)\). Thus, for ill-conditioned covariance matrices \(\frac{\mathrm {C_p}(X)}{\sigma _X^2},\frac{\mathrm {C_p}(Y)}{\sigma ^2_Y}\) will not be on the same scale. It seems plausible to conjecture that the dependence on the minimal eigenvalue and Poincaré constant could be replaced by a quantity which would take into consideration all eigenvalues.

Some other known stability results, both for log-concave vectors and for other classes of measures, may be found in [5, 6, 26]. The reader is referred to [6, Section 2.2] for a complete discussion. Let us mention one important special case, which is relevant to our results; the so-called entropy jump, first proved for the one dimensional case by Ball et al. [1] and then generalized by Ball and Nguyen to arbitrary dimensions in [2]. According to the latter result, if X is a log-concave and isotropic random vector, then

$$\begin{aligned} \delta _{EPI, \frac{1}{2}}(X,X) \ge \frac{1}{8\mathrm {C_p}(X)}\mathrm {D}(X||G), \end{aligned}$$

where \(\mathrm {C_p}(X)\) is the Poincaré constant of X and G is the standard Gaussian. This should be compared to both Corollary 1 and Theorem 2. That is, in the special case of two identical measures and \(\lambda = \frac{1}{2}\), their result gives a better dependence on the Poincaré constant than the one afforded by our results.

Ball and Nguyen [2] also give an interesting motivation for these type of inequalities: They show that if for some constant \(\kappa > 0\),

$$\begin{aligned} \delta _{EPI, \frac{1}{2}}(X,X) \ge \kappa \mathrm {D}(X||G), \end{aligned}$$

then the density \(f_X\) of X satisfies, \(f_X(0) \le e^{\frac{2d}{\kappa }}\). The isotropic constant of X is defined by \(L_X := f_X(0)^{\frac{1}{d}}\), and is the main subject of the slicing conjecture, which hypothesizes that \(L_X\) is uniformly bounded by a constant, independent of the dimension, for every isotropic log-concave vector X. Ball and Nguyen observed that using the above fact in conjunction with an entropy jump estimate gives a bound on the isotropic constant in terms of the Poincaré constant, and in particular the slicing conjecture is implied by the KLS conjecture ([7] gives another proof of this reduction which applies to specific measures).

See [16, 20] for more results concerning the entropy jump, as well as connections and analogues to the discrete setting and additive combinatorics.

Our final results give improved bounds under the assumption that X and Y are already close to being Gaussian, in terms of relative entropy, or if one them is a Gaussian. We record these results in the following theorems.

Theorem 3

Suppose that XY be isotropic log-concave vectors such that \(\mathrm {C_p}(X),\mathrm {C_p}(Y) \le \mathrm {C_p}\) for some \(\mathrm {C_p} < \infty \). Suppose further that \(\mathrm {D}(X||G), \mathrm {D}(Y||G) \le \frac{1}{4}\), then

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y) \ge \frac{\lambda (1-\lambda )}{36\mathrm {C_p}}\left( \mathrm {D}(X||G) + \mathrm {D}(Y||G)\right) \end{aligned}$$

The following gives an improved bound in the case that one of the random vectors is a Gaussian, and holds in full generality with respect to the other vector, without a log-concavity assumption.

Theorem 4

(Theorem 9 in [6]) Let X be a centered random vector with finite Poincaré constant, \(\mathrm {C_p}(X) < \infty \). Then

$$\begin{aligned} \delta _{EPI, \lambda }(X,G) \ge \left( \lambda - \frac{\lambda \left( \mathrm {C_p}(X) - 1\right) - \ln \left( \lambda \left( \mathrm {C_p}(X)-1\right) +1\right) }{\mathrm {C_p}(X) -\ln \left( \mathrm {C_p}(X)\right) - 1}\right) \mathrm {D}(X||G). \end{aligned}$$

Remark 3

When \(\mathrm {C_p}(X) \ge 1\), the following inequality holds

$$\begin{aligned} \left( \lambda - \frac{\lambda \left( \mathrm {C_p}(X) - 1\right) - \ln \left( \lambda \left( \mathrm {C_p}(X)-1\right) +1\right) }{\mathrm {C_p}(X) -\ln \left( \mathrm {C_p}(X)\right) - 1}\right) \ge \frac{\lambda (1-\lambda )}{\mathrm {C_p}(X)}. \end{aligned}$$

Remark 4

Theorem 4 was already proved in [6] by using a slightly different approach. Denote by \(\mathrm {I}(X||G)\), the relative Fisher information of the random vector X. In [11] the authors prove the following improved log-Sobolev inequality.

$$\begin{aligned} \mathrm {I}(X||G) \ge 2\mathrm {D}(X||G)\frac{(1-\mathrm {C_p}(X))^2}{\mathrm {C_p}(X)(\mathrm {C_p}(X)-\ln \left( \mathrm {C_p}(X)-1\right) )}. \end{aligned}$$

The theorem follows by integrating the inequality along the Ornstein-Uhlenbeck semi-group.

1.2 Discussion and further directions for research

Perhaps the first question that arises in light of Theorem 2 is whether log-concavity is necessary. It should be noted that in dimension 1, in the special case that X has the same distribution as Y, log-concavity is not needed, see [1]. We do not have a counterexample to the conjecture that Theorem 2 holds true without any log-concavity assumption.

A very interesting related question is to try to characterize the approximate equality cases for the Shannon–Stam inequality. In a recent paper [9], it is shown that measures which almost saturate the log-Sobolev inequality are close, in Wasserstein distance, to mixtures of isotropic Gaussians. It is natural to ask whether this is also the case for the Shannon–Stam inequality. In other words, could it be that the Wasserstein distance between both X and Y to a mixture of Gaussians can be bounded by some function of \(\delta _{EPI, \lambda }(X,Y)\). We provide a short discussion following Lemma 2 which gives a heuristic towards such a result.

Finally, a natural question which was mentioned above is to understand the optimal dependence on the Poincaré constant in Theorem 2. It is plausible that the dependence \(\mathrm {C_P}^3\) can be replaced by \(\mathrm {C_P}\). This is supported by the result of Ball and Nguyen [2]. In a related note, in the context of Theorem 3, it makes sense to ask if a related bound (even a dimension-dependent one) can be attained when the assumption \(\mathrm {D}(X||G), \mathrm {D}(Y||G) \le \frac{1}{4}\) is relaxed to \(\mathrm {D}(X||G), \mathrm {D}(Y||G) \le \frac{d}{4}\), for example.

2 Bounding the deficit via martingale embeddings

Our approach is based on ideas somewhat related to the ones which appear in [9, 10, 14]: the very high-level plan of the proof is to embed the variables XY as the terminal points of some martingales and express the entropies of XY and \(X+Y\) as functions of the associates quadratic co-variation processes. One of the main benefits in using such an embedding is that the co-variation process of \(X+Y\) can be easily expressed in terms on the ones of XY, as demonstrated below. In [10] these ideas where used to produce upper bounds for the entropic central limit theorem, so it stands to reason that related methods may be useful here. It turns out, however, that in order to produce meaningful bounds for the Shannon–Stam inequality, one needs a more intricate analysis, since this inequality corresponds to a second-derivative phenomenon: whereas for the CLT one only needs to produce upper bounds on the relative entropy, here we need to be able to compare, in a non-asymptotic way, two relative entropies.

In particular, our martingale embedding is constructed using the entropy minimizing technique developed by Föllmer [12, 13] and later Lehec [19]. This construction has several useful features, one of which is that it allows us to express the relative entropy of a measure in \({\mathbb {R}}^d\) in terms of a variational problem on the Wiener space. In addition, upon attaining a slightly different point of view on this process, that we introduce here, the behavior of this variational expression turns out to be tractable with respect to convolutions. The reader is refereed to [22] for the necessary background in stochastic calculus.

In order to outline the argument, fix centered measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^d\) with finite second moment. Let \(X \sim \mu \), \(Y \sim \nu \) be random vectors and \(G \sim \gamma \) a standard Gaussian random vector.

An entropy-minimizing drift Let \(B_t\) be a standard Brownian motion on \({\mathbb {R}}^d\) and denote by \(\mathcal {F}_t\) its natural filtration. In the sequel, the following process plays a fundamental role:

$$\begin{aligned} v^X_t = \arg \min \limits _{u_t} \frac{1}{2}\int \limits _0^1{\mathbb {E}}\left[ \left\Vert u_t\right\Vert _2^2\right] dt, \end{aligned}$$
(4)

where the minimum is taken with respect to all processes \(u_t\) adapted to \(\mathcal {F}_t\), such that

$$\begin{aligned} B_1 + \int \limits _0^1u_tdt \sim \mu . \end{aligned}$$

Amazingly, under mild assumptions on \(\mu \), and in particular in the case that \(\mu \) is log-concave, there exists a unique minimizer to Eq. (4), from which we construct the process

$$\begin{aligned} X_t := B_t + \int \limits _0^tv^X_sds, \end{aligned}$$

also known as the Föllmer process, with \(v_t^X\) being the associated Föllmer drift. We refer the reader to [19] for proofs of the existence and uniqueness of the process, as well as of a few other facts summarized below.

It turns out that the process \(v_t^X\) is a martingale [which goes together with the fact that it minimizes the quadratic form in (4)] which is given by the equation

$$\begin{aligned} v_t^X = \nabla _x \ln \left( P_{1-t}(f_X(X_t))\right) , \end{aligned}$$
(5)

where \(f_X\) is the density of X with respect to the standard Gaussian and \(P_{1-t}\) denotes the heat semi-group,

$$\begin{aligned} P_{1-t}f(x) = {\mathbb {E}}\left[ f(x + B_{1-t})\right] . \end{aligned}$$

In fact, Girsanov’s formula gives a very useful relation between the energy of the drift and the entropy of X, namely,

$$\begin{aligned} \mathrm {D}(X||G) = \frac{1}{2}\int \limits _0^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt. \end{aligned}$$
(6)

This gives the following alternative interpretation for the process: suppose that the Wiener space is equipped with an underlying probability measure P, with respect to which the process \(B_t\) is a Brownian motion as above. Let Q be a measure on Wiener space such that

$$\begin{aligned} \frac{dP}{dQ} = \frac{d \mu }{d \gamma } (X_1), \end{aligned}$$

then the process \(X_t\) is a Brownian motion with respect to the measure Q. By the representation theorem for the Brownian bridge, this tells us that the process \(X_t\) conditioned on \(X_1\) is a Brownian bridge between 0 and \(X_1\). In particular, we have

$$\begin{aligned} X_t {\mathop {=}\limits ^{d}} tX_1 +\sqrt{ t(1-t)}G. \end{aligned}$$
(7)

Lehec’s proof of the Shannon–Stam inequality For the sake of intuition, we now repeat Lehec’s argument to reproduce the Shannon–Stam inequality (3) using this process. Let \(X_t := B^X_t + \int \limits _0^tv^X_sds\) and \(Y_t := B^Y_t + \int \limits _0^tv^Y_sds\) be the Föllmer processes associated to X and Y, where \(B_t^X\) and \(B_t^Y\) are independent Brownian motions. For \(\lambda \in (0,1)\), define the new processes

$$\begin{aligned} w_t = \sqrt{\lambda } v_t^X + \sqrt{1-\lambda }v_t^Y, \end{aligned}$$

and

$$\begin{aligned} {\tilde{B}}_t = \sqrt{\lambda }B_t^X + \sqrt{1-\lambda }B_t^Y. \end{aligned}$$

By the independence of \(B_t^X\) and \(B_t^Y\), \({\tilde{B}}_t\) is a Brownian motion and

$$\begin{aligned} {\tilde{B}}_1 + \int \limits _0^1 w_tdt = \sqrt{\lambda }X_1 + \sqrt{1-\lambda }Y_1. \end{aligned}$$

Note that as the \(v_t^X\) is martingale, we have for every \(t\in [0,1]\),

$$\begin{aligned} {\mathbb {E}}\left[ v_t^X\right] = {\mathbb {E}}\left[ X_1\right] = 0. \end{aligned}$$

Using Eqs. (4) and (6) and recalling that the processes are independent, we finally have

$$\begin{aligned} \mathrm {D}(\sqrt{\lambda }X_1 + \sqrt{1-\lambda }Y_1||G)&\le \frac{1}{2}\int \limits _0^1{\mathbb {E}}\left[ \left\Vert w_t\right\Vert _2^2\right] dt \\&= \frac{\lambda }{2}\int {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt + \frac{1-\lambda }{2}\int {\mathbb {E}}\left[ \left\Vert v_t^Y\right\Vert _2^2\right] dt \\&= \lambda \mathrm {D}(X_1||G) + (1-\lambda )\mathrm {D}(Y_1||G). \end{aligned}$$

This recovers the Shannon–Stam inequality in the form (3).

An alternative point of view: replacing the drift by a varying diffusion coefficient Lehec’s proof gives rise to the following idea: Suppose the processes \(v_t^X\) and \(v_t^Y\) could be coupled in a way such that the variance of the resulting process \(\sqrt{\lambda } v_t^X + \sqrt{1-\lambda }v_t^Y\) was smaller than that of \(w_t\) above. Such a coupling would improve on (3) and that is the starting point of this work.

As it turns out, however, it is easier to get tractable bounds by working with a slightly different interpretation of the above processes, in which the role of the drift is taken by an adapted diffusion coefficient of a related process.

The idea is as follows: Suppose that \(M_t := \int \limits _0^t F_sdB_s\) is a martingale, where \(F_t\) is some positive-definite matrix valued process adapted to \(\mathcal {F}_t\). Consider the drift defined by

$$\begin{aligned} u_t := \int \limits _0^t\frac{F_s - \mathrm {I}_d}{1-s}dB_s. \end{aligned}$$
(8)

We then claim that \(B_1 + \int \limits _{0}^1u_tdt = M_1\). To show this, we use the stochastic Fubini Theorem [27] to write

$$\begin{aligned} \int \limits _0^1F_tdB_t = \int \limits _0^1\mathrm {I}_ddB_t+\int \limits _0^1\left( F_t-\mathrm {I}_d\right) dB_t = B_1 + \int \limits _0^1\int \limits _t^1 \frac{F_t - \mathrm {I}_d}{1-t}dsdB_t = B_1 + \int \limits _0^1u_tdt. \end{aligned}$$

Since we now expressed the random variable \(M_1\) as the terminal point of a standard Brownian motion with an adapted drift, the minimality property of the Föllmer drift together with Eq. (6) immediately produces a bound on its entropy. Namely, by using Itô’s isometry and Fubini’s theorem we have the bound

$$\begin{aligned} \mathrm {D}(M_1||G) {\mathop {\le }\limits ^{(6)}} \frac{1}{2}\int \limits _{0}^1{\mathbb {E}}\left[ \left\Vert u_t\right\Vert _2^2\right]= & {} \frac{1}{2} \mathrm {Tr}\int \limits _0^1\int \limits _0^t\frac{{\mathbb {E}}\left[ \left( F_s - \mathrm {I}_d\right) ^2\right] }{(1-s)^2}dsdt \nonumber \\= & {} \frac{1}{2} \mathrm {Tr}\int \limits _{0}^1\frac{{\mathbb {E}}\left[ \left( F_t - \mathrm {I}_d\right) ^2\right] }{1-t}dt. \end{aligned}$$
(9)

This hints at the following possible scheme of proof: in order to give an upper bound for the expression \(\mathrm {D}(\sqrt{\lambda }X_1 + \sqrt{1-\lambda }Y_1||G)\), it suffices to find martingales \(M_t^X\) and \(M_t^Y\) such that \(M_1^{X}, M_1^Y\) have the laws of X and Y, respectively, and such that the \(\lambda \)-average of the covariance processes is close to the identity.

The Föllmer process gives rise to a natural martingale: Consider \({\mathbb {E}}\left[ X_1|\mathcal {F}_t\right] \), the associated Doob martingale. By the martingale representation theorem ([22, Theorem 4.3.3]) there exists a uniquely defined adapted matrix valued process \(\varGamma _t^X\), for which

$$\begin{aligned} {\mathbb {E}}\left[ X_1|\mathcal {F}_t\right] = \int \limits _0^t\varGamma ^X_sdB_s^X. \end{aligned}$$
(10)

By following the construction in (8) and considering the process \({\tilde{v}}_t^X := \int \limits _0^t\frac{\varGamma ^X_s -\mathrm {I}_d}{1-s}dB^X_s\), it is immediate that \(B_1 + \int \limits _{0}^1{\tilde{v}}_t^Xdt = X_1\). Observe that \(v_t - {\tilde{v}}_t\) is a martingale and that for every \(t \in [0,1]\), \(\int \limits _t^1(v_s^X - {\tilde{v}}_s^X)ds|\mathcal {F}_t = 0,\) almost surely. It thus follows that \(v_t^X\) and \({{\tilde{v}}}_t^X\) are almost surely the same process. We conclude the following representation for the Föllmer drift,

$$\begin{aligned} v_t^X = \int \limits _0^t\frac{\varGamma ^X_s -\mathrm {I}_d}{1-s}dB^X_s. \end{aligned}$$
(11)

The matrix \(\varGamma _t^X\) turns out to be positive definite almost surely, (in fact, it has an explicit simple representation, see Proposition 1 below), which yields, by combining (6) with same calculation as in (9),

$$\begin{aligned} \mathrm {D}(X||G) = \frac{1}{2}\int \limits _0^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _s^X-\mathrm {I}_d \right) ^2\right] \right) }{1-t}dt. \end{aligned}$$
(12)

Given the processes \(\varGamma _t^X\) and \(\varGamma _t^Y\), we are now in position to express \(\sqrt{\lambda } X + \sqrt{1-\lambda } Y\) as the terminal point of a martingale, towards using (9), which would lead to a bound on \(\delta _{EPI,\lambda }\). We define

$$\begin{aligned} \tilde{\varGamma _t} := \sqrt{\lambda \left( \varGamma _t^X\right) ^2 + (1-\lambda )\left( \varGamma _t^Y\right) ^2}, \end{aligned}$$

and a martingale \({\tilde{B}}_t\) which satisfies

$$\begin{aligned} {\tilde{B}}_0 = 0 \text { and } d{\tilde{B}}_t = {\tilde{\varGamma }}^{-1}_t\left( \sqrt{\lambda }\varGamma _t^XdB_t^X + \sqrt{1-\lambda }\varGamma _t^YdB_t^Y\right) . \end{aligned}$$

Since \(\varGamma _t^X\) and \(\varGamma _t^Y\) are invertible almost surely and independent, it holds that

$$\begin{aligned}{}[{\tilde{B}}]_t = t\mathrm {I}_d, \end{aligned}$$

where \([{\tilde{B}}]_t\) denotes the quadratic co-variation of \({\tilde{B}}_t\). Thus, by Levy’s characterization, \({\tilde{B}}_t\) is a standard Brownian motion and we have the following equality in law

$$\begin{aligned} \int \limits _0^1\tilde{\varGamma _t}d{\tilde{B}}_t = \sqrt{\lambda }\int \limits _0^1\varGamma _t^XdB_t^X + \sqrt{1-\lambda }\int \limits _{0}^1\varGamma _t^YdB_t^Y {\mathop {=}\limits ^{d}} \sqrt{\lambda }X_1 + \sqrt{1-\lambda }Y_1. \end{aligned}$$

We can now invoke (9) to get

$$\begin{aligned} \mathrm {D}\left( \sqrt{\lambda }X_1 + \sqrt{1-\lambda }Y_1\big |\big |G\right) \le \frac{1}{2} \int \limits _{0}^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \tilde{\varGamma _t} - \mathrm {I}_d\right) ^2\right] \right) }{1-t}dt. \end{aligned}$$

Combining this with the identity (12) finally gives a bound on the deficit in the Shannon–Stam inequality, in the form

$$\begin{aligned}&\delta _{EPI,\lambda }(X,Y) \nonumber \\&\quad \ge \frac{1}{2}\int \limits _0^1\frac{\mathrm {Tr}\left( \lambda {\mathbb {E}}\left[ \left( \varGamma _t^X - \mathrm {I}_d\right) ^2\right] +(1-\lambda ){\mathbb {E}}\left[ \left( \varGamma _t^Y - \mathrm {I}_d\right) ^2\right] - {\mathbb {E}}\left[ \left( \tilde{\varGamma _t} - \mathrm {I}_d\right) ^2\right] \right) }{1-t}dt \nonumber \\&\quad = \int \limits _0^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \tilde{\varGamma _t}\right] - \lambda {\mathbb {E}}\left[ \varGamma ^X_t\right] - (1-\lambda ){\mathbb {E}}\left[ \varGamma _t^Y\right] \right) }{1-t}dt. \end{aligned}$$
(13)

The following technical lemma will allow us to give a lower bound for the right hand side in terms of the variances of the processes \(\varGamma _t^X, \varGamma _t^Y\). Its proof is postponed to the end of the section.

Lemma 1

Let A and B be positive definite matrices and denote

$$\begin{aligned} (A,B)_\lambda : = \lambda A+(1-\lambda )B \text { and } (A^2,B^2)_\lambda : = \lambda A^2+(1-\lambda )B^2. \end{aligned}$$

Then

$$\begin{aligned} \mathrm {Tr}\left( \sqrt{(A^2,B^2)_\lambda } - (A,B)_\lambda \right) = \lambda (1-\lambda )\mathrm {Tr}\left( \left( A- B\right) ^2\left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) . \end{aligned}$$

Combining the lemma with the estimate obtained in (13) produces the following result, which will be our main tool in studying \(\delta _{EPI, \lambda }\).

Lemma 2

Let X and Y be centered random vectors on \({\mathbb {R}}^d\) with finite second moment, and let \(\varGamma _t^X, \varGamma _t^Y\) be defined as above. Then,

$$\begin{aligned}&\delta _{EPI,\lambda }(X,Y) \nonumber \\&\quad \ge \lambda (1-\lambda )\int \limits _{0}^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\left( \sqrt{\lambda \left( \varGamma _t^X\right) ^2 + (1-\lambda )\left( \varGamma _t^Y\right) ^2} + \lambda \varGamma _t^X + (1-\lambda )\varGamma _t^Y\right) ^{-1}\right] \right) }{1-t}dt. \end{aligned}$$
(14)

The expression on the right-hand side of (14) may seem unwieldy, however, in many cases it can be simplified. For example, if it can be shown that, almost surely, \(\varGamma _t^X, \varGamma _t^Y \preceq c_t\mathrm {I}_d\) for some deterministic \(c_t > 0\), then we obtain the more tractable inequality

$$\begin{aligned} \delta _{EPI,\lambda }(X,Y)\ge \frac{\lambda (1-\lambda )}{2}\int \limits _{0}^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] \right) }{(1-t)c_t}dt. \end{aligned}$$
(15)

As we will show, this is the case when the random vectors are log-concave.

Let us now take a small detour to discuss a heuristic idea for a possible extension of our results towards characterizing the approximate equality cases in the Shannon–Stam inequality (as alluded to in Sect. 1.2). Using Eq. (16) below, it can be shown that after a small period of time, the quantities \( \Vert {\mathbb {E}}\left[ \varGamma _t^X \right] \Vert _{op} \) and \(\Vert {\mathbb {E}}\left[ \varGamma _t^Y\right] \Vert _{op}\) become small. This suggests that it may be the case that for any X and Y (hence, even without a log-concavity assumption or a bound on the Poincaré constant), it holds that

$$\begin{aligned} \delta _{EPI,\lambda }(X,Y)\ge c(t_0) \frac{\lambda (1-\lambda )}{2} \int \limits _{t_0}^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] \right) }{(1-t)}dt. \end{aligned}$$

where \(\lim _{t_0 \rightarrow 0} c(t_0) = 0\). Our techniques show that, in this case, the laws of random variables \(X_1 | \mathcal {F}_{t_0}\) and \(Y_1 | \mathcal {F}_{t_0}\) are close to Gaussian (with high probability with respect to \(\mathcal {F}_{t_0}\)), which would imply in this case that X and Y are close to mixtures of Gaussians. A related version of this idea appears in [9] and [21].

Proof of Lemma 1

We have

$$\begin{aligned}&\mathrm {Tr}\left( \sqrt{(A^2,B^2)_\lambda } - (A,B)_\lambda \right) \\&\quad = \mathrm {Tr}\left( \left( \sqrt{(A^2,B^2)_\lambda } - (A,B)_\lambda \right) \left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) \right. \\&\qquad \left. \left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) . \end{aligned}$$

As

$$\begin{aligned}&\big (\sqrt{(A^2,B^2)_\lambda } - (A,B)_\lambda \big )\left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) \\&\quad = \lambda (1-\lambda )\left( A^2 + B^2- AB - BA\right) +\sqrt{(A^2,B^2)_\lambda }(A,B)_\lambda - (A,B)_\lambda \sqrt{(A^2,B^2)_\lambda }, \end{aligned}$$

we have the equality

$$\begin{aligned}&\mathrm {Tr}\left( \sqrt{(A^2,B^2)_\lambda } - (A,B)_\lambda \right) \\&\quad = \lambda (1-\lambda )\mathrm {Tr}\left( \left( A^2 + B^2- \left( AB + BA\right) \right) \left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) \\&\qquad +\mathrm {Tr}\left( \sqrt{(A^2,B^2)_\lambda }(A,B)_\lambda \left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) \\&\qquad -\mathrm {Tr}\left( (A,B)_\lambda \sqrt{(A^2,B^2)_\lambda }\left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) \end{aligned}$$

Finally, as the trace is invariant under any permutation of three symmetric matrices we have that

$$\begin{aligned}&\mathrm {Tr}\left( AB\left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) \nonumber \\&\quad = \mathrm {Tr}\left( BA\left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) , \end{aligned}$$

and

$$\begin{aligned}&\mathrm {Tr}\Big (\sqrt{(A^2,B^2)_\lambda }(A,B)_\lambda \left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\Big )\\&\quad =\mathrm {Tr}\left( (A,B)_\lambda \sqrt{(A^2,B^2)_\lambda }\left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} \mathrm {Tr}\left( \sqrt{(A^2,B^2)_\lambda } - (A,B)_\lambda \right) =\lambda (1-\lambda )\mathrm {Tr}\left( \left( \left( A - B\right) ^2\right) \left( \sqrt{(A^2,B^2)_\lambda } + (A,B)_\lambda \right) ^{-1}\right) , \end{aligned}$$

as required. \(\square \)

2.1 The Föllmer process associated to log-concave random vectors

In this section, we collect several results pertaining to the Föllmer process. Throughout the section, we fix a random vector X in \({\mathbb {R}}^n\) and associate to it the Föllmer process \(X_t\), defined in the previous section, as well as the process \(\varGamma ^X_t\), defined in Eq. (10) above. The next result lists some of its basic properties, and we refer to [8, 10] for proofs.

Proposition 1

For \(t \in (0,1)\) define

$$\begin{aligned} f^t_X(x) := f_X(x)\exp \left( \frac{\left\Vert x-X_t\right\Vert _2^2}{2(1-t)}\right) Z_{t,X}^{-1}, \end{aligned}$$

where \(f_X\) is the density of X with respect to the standard Gaussian and \(Z_{t,X}\) is a normalizing constant defined so that \(\int \limits _{{\mathbb {R}}^d} f_X^t = 1\). Then

  1. 1.

    \(f_X^t\) is the density of the random measure \(\mu _t := X_1|\mathcal {F}_t\) with respect to the standard Gaussian and \(\varGamma ^X_t = \frac{\mathrm {Cov}\left( \mu _t\right) }{1-t}\).

  2. 2.

    \(\varGamma ^X_t\) is almost surely a positive definite matrix, in particular, it is invertible.

  3. 3.

    For all \(t \in (0,1)\), we have

    $$\begin{aligned} \frac{d}{dt}{\mathbb {E}}\left[ \varGamma ^X_t\right] = \frac{{\mathbb {E}}\left[ \varGamma ^X_t\right] - {\mathbb {E}}\left[ \left( \varGamma ^X_t\right) ^2\right] }{1-t}. \end{aligned}$$
    (16)
  4. 4.

    The following identity holds

    $$\begin{aligned} {\mathbb {E}}\left[ v_t^X\otimes v_t^X\right] = \frac{\mathrm {I}_d - {\mathbb {E}}\left[ \varGamma ^X_t\right] }{1-t} + \mathrm {Cov}(X) - \mathrm {I}_d, \end{aligned}$$
    (17)

    for all \(t \in [0,1]\). In particular, if \(\mathrm {Cov}(X) \preceq \mathrm {I}_d\), then \({\mathbb {E}}\left[ \varGamma ^X_t\right] \preceq \mathrm {I}_d\).

In what follows, we restrict ourselves to the case that X is log-concave. Using this assumption we will establish several important properties for the matrix \(\varGamma _t\). For simplicity, we will write \(\varGamma _t := \varGamma _t^X\) and \(v_t := v_t^X\). The next result shows that the matrix \(\varGamma _t\) is bounded almost surely.

Lemma 3

Suppose that X is log-concave, then for every \(t \in (0,1)\)

$$\begin{aligned} \varGamma _t \preceq \frac{1}{t}\mathrm {I}_d. \end{aligned}$$

Moreover, if for some \(\xi >0\), X is \(\xi \)-uniformly log-concave then

$$\begin{aligned} \varGamma _t \preceq \frac{1}{(1-t)\xi + t}\mathrm {I}_d. \end{aligned}$$

Proof

By Proposition 1, \(\mu _t\), the law of \(X_1|\mathcal {F}_t\) has a density \(\rho _t\), with respect to the Lebesgue measure, proportional to

$$\begin{aligned} f_X(x) \exp \left( \frac{\left\Vert x\right\Vert _2^2}{2}\right) \exp \left( -\frac{\left\Vert x-X_t\right\Vert _2^2}{2(1-t)}\right) = f_X(x) \exp \left( \frac{\left\Vert x\right\Vert _2^2(1-t) - \left\Vert x-X_t\right\Vert _2^2}{2(1-t)}\right) . \end{aligned}$$

Consequently, since \(-\nabla ^2f_X \succeq 0\),

$$\begin{aligned} -\nabla ^2\ln \left( \rho _t\right) = -\nabla ^2f_X - \left( 1 - \frac{1}{1-t}\right) \mathrm {I}_d \succeq \frac{t}{1-t}\mathrm {I}_d. \end{aligned}$$

It follows that, almost surely, \(\mu _t\) is \(\frac{t}{1-t}\)-uniformly log-concave. According to the Brascamp-Lieb inequality [3] \(\alpha \)-uniform log-concavity implies a spectral gap of \(\alpha \), and in particular \(\text {Cov}(\mu _t) \preceq \frac{1 - t}{t}\mathrm {I}_d\) and so, \(\varGamma _t = \frac{\mathrm {Cov}(\mu _t)}{1-t} \preceq \frac{1}{t}\mathrm {I}_d\). If, in addition, X is \(\xi \)-uniformly log-concave, so that \(-\nabla ^2f_X \succeq \xi \mathrm {I}_d\), then we may write

$$\begin{aligned} -\nabla ^2\ln (\rho _t) \succeq \left( \xi + \frac{t}{1-t}\right) \mathrm {I}_d = \frac{(1-t)\xi +t}{(1-t)}\mathrm {I}_d \end{aligned}$$

and the arguments given above show \(\text {Cov}(\mu _t) \preceq \frac{(1-t)}{(1-t)\xi + t}\mathrm {I}_d\). Thus,

$$\begin{aligned} \varGamma _t \preceq \frac{1}{(1-t)\xi + t}\mathrm {I}_d. \end{aligned}$$

\(\square \)

Our next goal is to use the formulas given in the above lemma in order to bound from below the expectation of \(\varGamma _t\). We begin with a simple corollary.

Corollary 2

Suppose that X is 1-uniformly log-concave, then for every \(t \in [0,1]\)

$$\begin{aligned} {\mathbb {E}}\left[ \varGamma _t\right] \succeq \mathrm {Cov}(X). \end{aligned}$$

Proof

By (16), we have

$$\begin{aligned} \frac{d}{dt}{\mathbb {E}}\left[ \varGamma _t\right] = \frac{{\mathbb {E}}\left[ \varGamma _t\right] - {\mathbb {E}}\left[ \varGamma _t^2\right] }{1-t}. \end{aligned}$$

By Lemma 3, \(\varGamma _t\preceq \mathrm {I}_d\), which shows

$$\begin{aligned} \frac{d}{dt}{\mathbb {E}}\left[ \varGamma _t\right] \succeq 0. \end{aligned}$$

Thus, for every t,

$$\begin{aligned} {\mathbb {E}}\left[ \varGamma _t\right] \succeq {\mathbb {E}}\left[ \varGamma _0\right] = \mathrm {Cov}(X|\mathcal {F}_0) = \mathrm {Cov}(X). \end{aligned}$$

\(\square \)

To produce similar bounds for general log-concave random vectors, we require more intricate arguments. Recall that \(\mathrm {C_p}(X)\) denotes the Poincaré constant of X.

Lemma 4

If X is centered and has a finite a Poincaré constant \(\mathrm {C_p}(X) < \infty \), then

$$\begin{aligned} {\mathbb {E}}\left[ v_t^{\otimes 2}\right] \preceq \left( t^2\mathrm {C_p}(X)+t(1-t)\right) \frac{d}{dt}{\mathbb {E}}\left[ v_t^{\otimes 2}\right] . \end{aligned}$$

Proof

Recall that, by Eq. (7), we know that \(X_t\) has the same law as \(tX_1 + \sqrt{t(1-t)}G\), where G is a standard Gaussian independent of \(X_1\). Since \(\mathrm {C_p}(tX) = t^2\mathrm {C_p}(X) \) and since the Poincaré constant is sub-additive with respect to convolution [4] we get

$$\begin{aligned} \mathrm {C_p}(X_t) \le t^2\mathrm {C_p}(X) + t(1-t). \end{aligned}$$

The drift, \(v_t\), is a function of \(X_t\) and \({\mathbb {E}}\left[ v_t\right] = 0\). Equation (5) implies that \(\nabla _x v_t(X_t)\) is a symmetric matrix, hence the Poincaré inequality yields

$$\begin{aligned} {\mathbb {E}}\left[ v_t^{\otimes 2}\right] \preceq \left( t^2\mathrm {C_p}(X)+t(1-t)\right) {\mathbb {E}}\left[ \nabla _x v_t(X_t)^2\right] . \end{aligned}$$

As \(v_t(X_t)\) is a martingale, by Itô’s lemma we have

$$\begin{aligned} dv_t(X_t) = \nabla _xv_t(X_t)dB_t. \end{aligned}$$

An application of Itô’s isometry then shows

$$\begin{aligned} {\mathbb {E}}\left[ \nabla _x v_t(X_t)^2\right] = \frac{d}{dt}{\mathbb {E}}\left[ v_t(X_t)^{\otimes 2}\right] , \end{aligned}$$

where we have again used the fact that \(\nabla _x v_t(X_t)\) is symmetric. \(\square \)

Using the last lemma, we can deduce lower bounds on the matrix \(\varGamma _t^X\) in terms of the Poincaré constant.

Corollary 3

Suppose that X is log-concave and that \(\sigma ^2\) is the minimal eigenvalue of \(\mathrm {Cov}(X)\). Then,

  1. 1.

    For every \(t \in \left[ 0,\frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1}\right] \), \({\mathbb {E}}\left[ \varGamma _t\right] \succeq \frac{\min (1,\sigma ^2)}{3}\mathrm {I}_d.\)

  2. 2.

    For every \(t \in \left[ \frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1}, 1\right] \), \({\mathbb {E}}\left[ \varGamma _t\right] \succeq \frac{\min (1,\sigma ^2)}{3}\frac{1}{t\left( 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1\right) }\mathrm {I}_d\).

Proof

Using Equation (11), Itô’s isometry and the fact that \(\varGamma _t\) is symmetric, we deduce that

$$\begin{aligned} \frac{d}{dt}{\mathbb {E}}\left[ v_t^{\otimes 2}\right] = {\mathbb {E}}\left[ \left( \frac{\varGamma _t - \mathrm {I}_d}{1-t}\right) ^2\right] , \end{aligned}$$

Combining this with equation (17) and using Lemma 4, we get

$$\begin{aligned} \mathrm {Cov}(X) - \mathrm {I}_d + \frac{\mathrm {I}_d - {\mathbb {E}}\left[ \varGamma _t\right] }{1-t} \preceq \left( t^2\mathrm {C_p}(X)+t(1-t)\right) \frac{{\mathbb {E}}\left[ \varGamma _t^2\right] -2{\mathbb {E}}\left[ \varGamma _t\right] +\mathrm {I}_d}{(1-t)^2}.\nonumber \\ \end{aligned}$$
(18)

In the case where X is log-concave, by Lemma 3, \(\varGamma _t \preceq \frac{1}{t}\mathrm {I}_d\) almost surely, therefore \({\mathbb {E}}\left[ \varGamma _t^2\right] \preceq \frac{1}{t}{\mathbb {E}}\left[ \varGamma _t\right] \). The above inequality then becomes

$$\begin{aligned}&(1-t)^2\left( \sigma ^2 - 1\right) \mathrm {I}_d + (1-t)(\mathrm {I}_d - {\mathbb {E}}\left[ \varGamma _t\right] )\\&\quad \preceq \left( t\mathrm {C_p}(X)+(1-t)\right) {\mathbb {E}}\left[ \varGamma _t\right] +\left( t^2\mathrm {C_p}(X)+t(1-t)\right) \left( \mathrm {I}_d -2{\mathbb {E}}\left[ \varGamma _t\right] \right) . \end{aligned}$$

Rearranging the inequality shows

$$\begin{aligned} \frac{\sigma ^2 - 2t\sigma ^2 - \mathrm {C_p}(X)t^2\ + t^2\sigma ^2}{2-4t-2\mathrm {C_p}(X)t^2 + \mathrm {C_p}(X)t +2t^2}\mathrm {I}_d \preceq {\mathbb {E}}\left[ \varGamma _t\right] . \end{aligned}$$

As long as \(t \le \frac{1}{ 2\left( \frac{\mathrm {C_p}(X)}{\sigma ^2}\right) +1}\), we have

$$\begin{aligned}&\text {if } \sigma ^2\ge 1, \ \ \ \ \frac{1}{3}\mathrm {I}_d\preceq \frac{\sigma ^2\left( 4\mathrm {C_p}(X) - \sigma ^2\right) }{2\mathrm {C_p}(X)(\sigma ^2 + 4)-\sigma ^4}\mathrm {I}_d \preceq {\mathbb {E}}\left[ \varGamma _t\right] ,\\&\text {if } \sigma ^2< 1, \ \ \ \frac{\sigma ^2}{3}\mathrm {I}_d\preceq \frac{\sigma ^2\left( 4\mathrm {C_p}(X) - \sigma ^2\right) }{2\mathrm {C_p}(X)(\sigma ^2 + 4)-\sigma ^4}\mathrm {I}_d \preceq {\mathbb {E}}\left[ \varGamma _t\right] , \end{aligned}$$

which gives the first bound. By (10), we also have the bound

$$\begin{aligned} \frac{d}{dt}{\mathbb {E}}\left[ \varGamma _t\right] = \frac{{\mathbb {E}}\left[ \varGamma _t\right] - {\mathbb {E}}\left[ \varGamma _t^2\right] }{1-t}\succeq \frac{1 - \frac{1}{t}}{1-t}{\mathbb {E}}\left[ \varGamma _t\right] = -\frac{1}{t}{\mathbb {E}}\left[ \varGamma _t\right] . \end{aligned}$$

The differential equation

$$\begin{aligned} g'(t) = -\frac{g(t)}{t}, g\left( \frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1}\right) = \frac{\min (1,\sigma ^2)}{3} \end{aligned}$$

has a unique solution given by

$$\begin{aligned} g(t) = \frac{\min (1,\sigma ^2)}{3}\frac{1}{ t\left( 2 \frac{\mathrm {C_p}(X)}{\sigma ^2}+1\right) }. \end{aligned}$$

Using Gromwall’s inequality, we conclude that for every \(t \in \left[ \frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1},1\right] \),

$$\begin{aligned} {\mathbb {E}}\left[ \varGamma _t\right] \succeq \frac{\min (1,\sigma ^2)}{3}\frac{1}{ t\left( 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1\right) }\mathrm {I}_d. \end{aligned}$$

\(\square \)

We conclude this section with a comparison lemma that will allow to control the values of \({\mathbb {E}}\left[ \left\Vert v_t\right\Vert _2^2\right] \).

Lemma 5

Let \(t_0 \in [0,1]\) and suppose that X is centered with a finite Poincaré constant \(\mathrm {C_p}(X) < \infty \). Then

  1. 1.

    For \(t_0 \le t \le 1,\)

    $$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t\right\Vert _2^2\right] \ge {\mathbb {E}}\left[ \left\Vert v_{t_0}\right\Vert _2^2\right] \frac{t_0\left( \mathrm {C_p}(X)-1\right) t + t}{t_0\left( \mathrm {C_p}(X)-1\right) t + t_0}. \end{aligned}$$
  2. 2.

    For \(0 \le t \le t_0,\)

    $$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t\right\Vert _2^2\right] \le {\mathbb {E}}\left[ \left\Vert v_{t_0}\right\Vert _2^2\right] \frac{t_0\left( \mathrm {C_p}(X)-1\right) t + t}{t_0\left( \mathrm {C_p}(X)-1\right) t + t_0}. \end{aligned}$$

Proof

Consider the differential equation

$$\begin{aligned} g(t) = \left( \mathrm {C_p}(X)t^2 + t(1-t)\right) g'(t) \text { with initial condition } g(t_0) = {\mathbb {E}}\left[ \left\Vert v_{t_0}\right\Vert _2^2\right] . \end{aligned}$$

It has a unique solution given by

$$\begin{aligned} g(t) = {\mathbb {E}}\left[ \left\Vert v_{t_0}\right\Vert _2^2\right] \frac{t_0\left( \mathrm {C_p}(X)-1\right) t + t}{t_0\left( \mathrm {C_p}(X)-1\right) t + t_0}. \end{aligned}$$

The bounds follow by applying Gromwall’s inequality combined with the result of Lemma 4. \(\square \)

3 Stability for 1-uniformly log-concave random vectors

In this section, we assume that X and Y are both 1-uniformly log-concave. Let \(B_t^X, B_t^Y\) be independent standard Brownian motions and consider the associated processes \(\varGamma _t^X, \varGamma _t^Y\) defined as in Sect. 2.

The key fact that makes the uniform log-concave case easier is Lemma 3, which implies that \(\varGamma _t^X,\varGamma _t^Y \preceq \mathrm {I}_d\) almost surely. In this case, Lemma 2 simplifies to

$$\begin{aligned}&\delta _{EPI, \lambda }(X,Y) \nonumber \\&\quad \ge \frac{\lambda (1-\lambda )}{2}\int \limits _{0}^1\left( \frac{\mathrm {Tr}\left( \mathrm {Var}(\varGamma _t^X)\right) }{1-t} + \frac{\mathrm {Tr}\left( \mathrm {Var}(\varGamma _t^Y)\right) }{1-t} \right. \nonumber \\&\qquad \left. + \frac{\mathrm {Tr}\left( \left( {\mathbb {E}}\left[ \varGamma _t^X\right] - {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) ^2\right) }{1-t}\right) dt, \end{aligned}$$
(19)

where we have used the fact that

$$\begin{aligned} \mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] \right)= & {} \mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - {\mathbb {E}}\left[ \varGamma _t^X\right] \right) ^2\right] + {\mathbb {E}}\left[ \left( \varGamma _t^Y - {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) ^2\right] \right. \\&\left. + \left( {\mathbb {E}}\left[ \varGamma _t^X\right] - {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) ^2\right) . \end{aligned}$$

Consider the two Gaussian random vectors defined as

$$\begin{aligned} G_X = \int \limits _{0}^1{\mathbb {E}}\left[ \varGamma _t^X\right] dB_t^X \text { and } G_Y = \int \limits _0^1{\mathbb {E}}\left[ \varGamma _t^Y\right] dB_t^Y, \end{aligned}$$

and observe that

$$\begin{aligned} X= & {} \int \limits _{0}^1\varGamma _t^XdB_t^X = \int \limits _0^1\left( \varGamma _t^X - {\mathbb {E}}\left[ \varGamma _t^X\right] \right) dB_t^X + \int \limits _{0}^1{\mathbb {E}}\left[ \varGamma _t^X\right] dB_t^X\\= & {} \int \limits _0^1\left( \varGamma _t^X - {\mathbb {E}}\left[ \varGamma _t^X\right] \right) dB_t^X +G_X. \end{aligned}$$

This induces a coupling between X and \(G_X\) from which we obtain, using Itô’s Isometry,

$$\begin{aligned} \mathcal {W}_2^2 \left( X, G_X\right) \le {\mathbb {E}}\left[ \left( \int \limits _0^1\left( \varGamma _t^X - {\mathbb {E}}\left[ \varGamma _t^X\right] \right) dB_t^X\right) ^2 \right] = \int \limits _0^1\mathrm {Tr}\left( \mathrm {Var}\left( \varGamma _t^X\right) \right) dt, \end{aligned}$$

and an analogous estimate also holds for Y. We may now use \({\mathbb {E}}\left[ \varGamma _t^X\right] \) and \({\mathbb {E}}\left[ \varGamma _t^Y\right] \) as the diffusion coefficients for the same Brownian motion to establish

$$\begin{aligned} \mathcal {W}^2_2(G_X, G_Y)\le & {} {\mathbb {E}}\left[ \left( \int \limits _{0}^1\left( {\mathbb {E}}\left[ \varGamma _t^X\right] -{\mathbb {E}}\left[ \varGamma _t^Y\right] \right) dB_t\right) ^2\right] \\= & {} \int \limits _0^1\mathrm {Tr}\left( \left( {\mathbb {E}}\left[ \varGamma _t^X\right] - {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) ^2\right) dt. \end{aligned}$$

Plugging these estimates into (19) reproves the following bound, which is identical to Theorem 1 in [6].

Theorem 5

Let X and Y be 1-uniformly log-concave centered vectors and let \(G_X, G_Y\) be defined as above. Then,

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y) \ge \frac{\lambda (1-\lambda )}{2}\left( \mathcal {W}_2^2\left( X||G_X\right) +\mathcal {W}_2^2\left( Y||G_Y\right) + \mathcal {W}_2^2\left( G_X,G_Y\right) \right) . \end{aligned}$$

To obtain a bound for the relative entropy towards the proof of Theorem 1, we will require a slightly more general version of inequality (9). This is the content of the next lemma, whose proof is similar to the argument presented above. The main difference comes from applying Girsanov’s theorem to a re-scaled Brownian motion, from which we obtain an expression analogous to (6). The reader is referred to [10, Lemma 2], for a complete proof.

Lemma 6

Let \(F_t\) and \(E_t\) be two \(F_t\)-adapted matrix-valued processes and let \(X_t\), \(M_t\) be two processes defined by

$$\begin{aligned} Z_t = \int \limits _{0}^tF_sdB_s, \text { and } M_t = \int \limits _{0}^tE_sdB_s. \end{aligned}$$

Suppose that for every \(t\in [0,1]\), \(E_t \succeq c\mathrm {I}_d\) for some deterministic \(c > 0\), then

$$\begin{aligned} \mathrm {D}(Z_1||M_1) \le \mathrm {Tr}\int \limits _0^1\frac{{\mathbb {E}}\left[ \left( F_t - E_t\right) ^2\right] }{c^2(1-t)}dt. \end{aligned}$$

Proof of Theorem 1

By Corollary 2

$$\begin{aligned} {\mathbb {E}}\left[ \varGamma _t^X\right] \succeq \sigma _X\mathrm {I}_d \text { and } {\mathbb {E}}\left[ \varGamma _t^Y\right] \succeq \sigma _Y\mathrm {I}_d\text { for every } t\in [0,1]. \end{aligned}$$

We invoke Lemma 6 with \(E_t = {\mathbb {E}}\left[ \varGamma _t^X\right] \) and \(F_t = \varGamma _t^X\) to obtain

$$\begin{aligned} \sigma _X^2\mathrm {D}(X||G_X)\le \int \limits _{0}^1\frac{\mathrm {Tr}\left( \mathrm {Var}\left( \varGamma _t^X\right) \right) }{1-t}dt. \end{aligned}$$

Repeating the same argument for Y gives

$$\begin{aligned} \sigma _Y^2\mathrm {D}(Y||G_Y)\le \int \limits _{0}^1\frac{\mathrm {Tr}\left( \mathrm {Var}\left( \varGamma _t^Y\right) \right) }{1-t}dt. \end{aligned}$$

By invoking Lemma 6 with \(F_t = {\mathbb {E}}\left[ \varGamma _t^X\right] \) and \(E_t = {\mathbb {E}}\left[ \varGamma _t^Y\right] \) and then one more time after switching between \(F_t\) and \(E_t\), and summing the results, we get

$$\begin{aligned} \frac{\sigma _Y^2}{2}\mathrm {D}(G_X||G_Y) + \frac{\sigma _X^2}{2}\mathrm {D}(G_Y||G_X)\le \int \limits _{0}^1\frac{\mathrm {Tr}\left( \left( {\mathbb {E}}\left[ \varGamma _t^X\right] - {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) ^2\right) }{1-t}dt. \end{aligned}$$

Plugging the above inequalities into (19) concludes the proof. \(\square \)

4 Stability for general log-concave random vectors

Fix XY, centered log-concave random vectors in \({\mathbb {R}}^d\), such that

$$\begin{aligned} \mathrm {Cov}(Y) + \mathrm {Cov}(X) = 2\mathrm {I}_d, \end{aligned}$$
(20)

with \(\sigma _X^2,\sigma _Y^2\) the corresponding minimal eigenvalues of \(\mathrm {Cov}(X)\) and \(\mathrm {Cov}(Y)\). Assume further that \(\frac{\mathrm {C_p}(Y)}{\sigma _Y^2},\frac{\mathrm {C_p}(X)}{\sigma _X^2} \le \mathrm {C_p}\), for some \(\mathrm {C_p} >1\). Again, let \(B_t^X\) and \(B_t^Y\) be independent Brownian motions and consider the associated processes \(\varGamma _t^X, \varGamma _t^Y\) defined as in Sect. 2.

The general log-concave case, in comparison with the case where X and Y are uniformly log-concave, gives rise to two essential difficulties. Recall that the results in the previous section used the fact that an upper bound for the matrices \(\varGamma _t^X,\varGamma _t^Y\), combined with equation (14) gives the simpler bound (19). Unfortunately, in the general log-concave case, there is no upper bound uniform in t, which creates the first problem. The second issue has to do with the lack of respective lower bounds for \({\mathbb {E}}[\varGamma _t^X]\) and \({\mathbb {E}}[\varGamma _t^Y]\): in view of Lemma 6, one needs such bounds in order to obtain estimates on the entropies.

The solution of the second issue lies in Corollary 3, which gives a lower bound for the processes in terms of the Poincaré constants. We denote \(\xi = \frac{1}{(2\mathrm {C_p}+1)}\frac{\min (\sigma _Y^2,\sigma _X^2)}{3}\), so that the corollary gives

$$\begin{aligned} {\mathbb {E}}\left[ \varGamma _t^Y\right] ,{\mathbb {E}}\left[ \varGamma _t^X\right] \succeq \xi \mathrm {I}_d. \end{aligned}$$
(21)

Thus, we are left with the issue arising from the lack of a uniform upper bound for the matrices \(\varGamma _t^X,\varGamma _t^Y\). Note that Lemma 3 gives \(\varGamma _t^X\preceq \frac{1}{t}\mathrm {I}_d\), a bound which is not uniform in t. To illustrate how one may overcome this issue, suppose that there exists an \(\varepsilon >0\), such that

$$\begin{aligned} \int \limits _0^\varepsilon \frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] \right) }{(1-t)}dt < \frac{1}{2} \int \limits _0^1\frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] \right) }{(1-t)}dt. \end{aligned}$$

In such a case, Lemma 2 would imply

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y) \gtrsim \frac{\lambda (1-\lambda )}{\varepsilon }\mathrm {Tr}\int \limits _{0}^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X-\varGamma _t^Y\right) ^2\right] }{1-t}dt. \end{aligned}$$

Towards finding an \(\varepsilon \) such that the above holds, note that since \(v_t^X\) is a martingale, and using (6) we have for every \(t_0 \in [0,1],\)

$$\begin{aligned} (1-t_0)\mathrm {D}\left( X||G\right) =\frac{1-t_0}{2}\int \limits _{0}^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt \le \frac{1}{2}\int \limits _{t_0}^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt \le \mathrm {D}\left( X||G\right) .\nonumber \\ \end{aligned}$$
(22)

Observe that

$$\begin{aligned} \mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] \right)= & {} \mathrm {Tr}\left( {\mathbb {E}}\left[ \left( \varGamma _t^X - \mathrm {I}_d\right) ^2\right] + {\mathbb {E}}\left[ \left( \varGamma _t^Y - \mathrm {I}_d\right) ^2\right] \right. \\&\left. -2{\mathbb {E}}\left[ \mathrm {I}_d - \varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d - \varGamma _t^Y\right] \right) . \end{aligned}$$

Using the relation in (11), Fubini’s theorem shows

$$\begin{aligned} \int \limits _{t_0}^{1}{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt&= \mathrm {Tr}\int \limits _{t_0}^{1}\int \limits _{0}^t \frac{{\mathbb {E}}\left[ \left( \varGamma _s^X-\mathrm {I}_d\right) ^2\right] }{(1-s)^2}dsdt \nonumber \\&= \mathrm {Tr}\int \limits _{0}^{t_0}\int \limits _{t_0}^{1}\frac{{\mathbb {E}}\left[ \left( \varGamma _s^X-\mathrm {I}_d\right) ^2\right] }{(1-s)^2}dtds + \mathrm {Tr}\int \limits _{t_0}^{1}\int \limits _{s}^1\frac{{\mathbb {E}}\left[ \left( \varGamma _s^X-\mathrm {I}_d\right) ^2\right] }{(1-s)^2}dtds\nonumber \\&= (1-t_0){\mathbb {E}}\left[ \left\Vert v_{t_0}^X\right\Vert _2^2\right] + \mathrm {Tr}\int \limits _{t_0}^{1}\frac{{\mathbb {E}}\left[ \left( \varGamma _s^X-\mathrm {I}_d\right) ^2\right] }{1-s}ds. \end{aligned}$$

Combining the last two displays gives

$$\begin{aligned} \mathrm {Tr}\int \limits _{t_0}^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] }{1-t}dt&= \int \limits _{t_0}^1\bigg ({\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] + {\mathbb {E}}\left[ \left\Vert v_t^Y\right\Vert _2^2\right] \bigg )dt \nonumber \\&\quad - (1-t_0)\left( {\mathbb {E}}\left[ \left\Vert v_{t_0}^X\right\Vert _2^2\right] +{\mathbb {E}}\left[ \left\Vert v_{t_0}^Y\right\Vert _2^2\right] \right) \nonumber \\&\quad -2\mathrm {Tr}\int \limits _{t_0}^1\frac{{\mathbb {E}}\left[ \mathrm {I}_d - \varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d - \varGamma _t^Y\right] }{1-t}dt. \end{aligned}$$
(23)

Using (17), we have the identities:

$$\begin{aligned} \frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] }{1-t} = {\mathbb {E}}\left[ v^X_t\otimes v^X_t\right] + \mathrm {I}_d-\mathrm {Cov}(X) \end{aligned}$$

and

$$\begin{aligned} \frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t} = {\mathbb {E}}\left[ v^Y_t\otimes v^Y_t\right] + \mathrm {I}_d-\mathrm {Cov}(Y), \end{aligned}$$

from which we deduce

$$\begin{aligned}&2\frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t}\\&\quad = \left( \mathrm {I}_d- {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) {\mathbb {E}}\left[ v^X_t\otimes v^X_t\right] + \left( \mathrm {I}_d- {\mathbb {E}}\left[ \varGamma _t^X\right] \right) {\mathbb {E}}\left[ v^Y_t\otimes v^Y_t\right] \\&\qquad +\left( \mathrm {I}_d- {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) \left( \mathrm {I}_d - \mathrm {Cov}(X)\right) + \left( \mathrm {I}_d- {\mathbb {E}}\left[ \varGamma _t^X\right] \right) \left( \mathrm {I}_d - \mathrm {Cov}(Y)\right) . \end{aligned}$$

Let \(\{w_i\}_{i=1}^d\) be an orthornormal basis of eigenvectors corresponding to the eigenvalues \(\{\lambda _i\}_{i=1}^d\) of \(\mathrm {I}_d-{\mathbb {E}}\left[ \varGamma _t^X\right] \). The following observation, which follows from the above identities, is crucial: if \(\lambda _i \le 0\) then necessarily

$$\begin{aligned} \langle w_i, \mathrm {Cov}(X)w_i\rangle \ge 1. \end{aligned}$$

In this case, by assumption (20), \(\langle w_i, \mathrm {Cov}(Y)w_i\rangle \le 1\) and

$$\begin{aligned} \left\langle w_i,\frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t} w_i \right\rangle \le 0. \end{aligned}$$

Our aim is to bound (23) from below; thus, in the calculation of the trace in the RHS, we may disregard all \(w_i\) corresponding to negative \(\lambda _i\). Moreover, if \(\lambda _i \ge 0\), we need only consider the cases where

$$\begin{aligned} \langle w_i,\left( \mathrm {I}_d- {\mathbb {E}}\left[ \varGamma _t^Y\right] \right) w_i\rangle \ge 0, \end{aligned}$$
(24)

as well. We note that these assumptions on \(w_i\) also imply

$$\begin{aligned}&\langle w_i,\left( {\mathbb {E}}\left[ v^X_t\otimes v^X_t\right] + \mathrm {I}_d-\mathrm {Cov}(X)\right) w_i\rangle \ge 0 \text { and }\nonumber \\&\langle w_i,\left( {\mathbb {E}}\left[ v^Y_t\otimes v^Y_t\right] + \mathrm {I}_d-\mathrm {Cov}(Y)\right) w_i\rangle \ge 0. \end{aligned}$$
(25)

Since \(w_i\) is an eigenvector of \(\mathrm {I}_d-{\mathbb {E}}\left[ \varGamma _t^X\right] \), it is also an eigenvector of \({\mathbb {E}}\left[ v^X_t\otimes v^X_t\right] + \mathrm {I}_d-\mathrm {Cov}(X)\) and we have the following equality:

$$\begin{aligned}&2 \left\langle w_i,\frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t} w_i \right\rangle \\&\quad = \langle w_i,{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] w_i\rangle \left( {\mathbb {E}}\left[ \langle v_t^Y,w_i\rangle ^2\right] + 1 - \langle w_i, \mathrm {Cov}(Y)w_i\rangle \right) \\&\qquad +\langle w_i,{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] w_i\rangle \left( {\mathbb {E}}\left[ \langle v_t^X,w_i\rangle ^2\right] + 1 - \langle w_i, \mathrm {Cov}(X)w_i\rangle \right) . \end{aligned}$$

The fact that \(\lambda _i \ge 0\) as well as (24) and (25), ensure that all four terms are positive. Using the estimate (21), the previous equation is bounded from above by

$$\begin{aligned}&(1-\xi )\big ({\mathbb {E}}\left[ \langle v_t^Y,w_i\rangle ^2\right] + 1 - \langle w_i, \mathrm {Cov}(Y)w_i\rangle + {\mathbb {E}}\left[ \langle v_t^X,w_i\rangle ^2\right] + 1 - \langle w_i, \mathrm {Cov}(X)w_i\rangle \big ) \\&\quad =(1-\xi )\big ({\mathbb {E}}\left[ \langle v_t^Y,w_i\rangle ^2\right] + {\mathbb {E}}\left[ \langle v_t^X,w_i\rangle ^2\right] \big ), \end{aligned}$$

where we have used (20). Summing over all the relevant \(w_i\) we get

$$\begin{aligned} 2\mathrm {Tr}\frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t} \le (1-\xi )\left( {\mathbb {E}}\left[ \left\Vert v^X_t\right\Vert _2^2\right] + {\mathbb {E}}\left[ \left\Vert v^Y_t\right\Vert _2^2\right] \right) . \end{aligned}$$

Plugging this into (23) and using (22) we have thus shown

$$\begin{aligned} \mathrm {Tr}\int \limits _{t_0}^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] }{1-t}dt&\ge 2\xi (1-t_0) \left( \mathrm {D}(X||G) + \mathrm {D}(Y||G)\right) \nonumber \\&\quad - (1-t_0)\left( {\mathbb {E}}\left[ \left\Vert v_{t_0}^X\right\Vert _2^2\right] +{\mathbb {E}}\left[ \left\Vert v_{t_0}^Y\right\Vert _2^2\right] \right) . \end{aligned}$$
(26)

This suggests that it may be useful to bound \({\mathbb {E}}\left[ \left\Vert v^X_{t_0}\right\Vert _2^2\right] \) from above, for small values of \(t_0\), which is the objective of the next lemma.

Lemma 7

If X is centered and has a finite Poincaré constant \(\mathrm {C_p}(X) < \infty \), then for every \(s \le \frac{1}{3(2\mathrm {C_p}(X)+1)}\) the following holds

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v^X_{s^2}\right\Vert ^2_2\right] < \frac{s}{4}\cdot \mathrm {D}(X||G). \end{aligned}$$

Proof

Suppose to the contrary that \({\mathbb {E}}\left[ \left\Vert v_{s^2}^X\right\Vert ^2_2\right] \ge \frac{s}{4}\cdot \mathrm {D}(X||G)\). Invoking Lemma 5 with \(t_0 = s^2\) gives

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] \ge \mathrm {D}(X||G)\cdot \frac{t\left( (\mathrm {C_p}(X) -1)s^2 + 1\right) }{4\left( (\mathrm {C_p}(X) -1)st + s\right) }, \end{aligned}$$

whenever \(t \ge s^2\). Thus,

$$\begin{aligned}&\int \limits _{s^2}^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt \nonumber \\&\quad \ge \mathrm {D}(X||G)\int \limits _{s^2}^1\frac{t\left( (\mathrm {C_p}(X) -1)s^2 + 1\right) }{4\left( (\mathrm {C_p}(X) -1)st + s\right) }dt \nonumber \\&\quad = \mathrm {D}(X||G)\left( (\mathrm {C_p}(X)-1)s^2+1\right) \frac{(\mathrm {C_p}(X)-1)t-\ln \left( t\left( \mathrm {C_p}(X)-1\right) + 1\right) }{4(\mathrm {C_p}(X)-1)^2s}\Bigg \vert _{s^2}^1. \end{aligned}$$
(27)

Note now that for \(s\le \frac{1}{3(2\mathrm {C_p}(X)+1)}\)

$$\begin{aligned} \frac{d}{ds} \frac{t\left( (\mathrm {C_p}(X) -1)s^2 + 1\right) }{4\left( (\mathrm {C_p}(X) -1)st + s\right) } = \frac{\left( \mathrm {C_p}(X)-1\right) s^2t-1}{s^2((\mathrm {C_p}(X)-1)t+1)}<0, \end{aligned}$$

and in particular we may substitute \(s = \frac{1}{3(2\mathrm {C_p}(X)+1)}\) in (27). In this case, a straightforward calculation yields

$$\begin{aligned} \int \limits _{\xi ^2_X}^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt > \mathrm {D}(X||G), \end{aligned}$$

which contradicts the identity (6), and concludes the proof by contradiction. \(\square \)

We would like to use the lemma with the choice \(s = \xi ^2\). In order to verify the condition on the lemma which amounts to \(\xi ^2 \le \frac{1}{3(2\mathrm {C_p}(X)+1)}\), we first remark that if \(\sigma _X^2 \le 1\), then it is clear that \(\xi \le \frac{1}{3(2\mathrm {C_p}(X)+1)}\). Otherwise, \(\sigma _X^2 \ge 1\) and

$$\begin{aligned} \xi \le \frac{1}{2\frac{\mathrm {C_p}(X)}{\sigma _X^2}+1}\frac{\sigma _Y^2}{3} \le \frac{1}{2\frac{\mathrm {C_p}(X)}{\sigma _X^2}+1}\frac{2 - \sigma _X^2}{3} \le \frac{1}{3(2\mathrm {C_p}(X)+1)}. \end{aligned}$$

As the same reasoning is also true for Y, we now choose \(t_0 = \xi ^2\), which allows to invoke the previous lemma in (26) and to establish:

$$\begin{aligned} \mathrm {Tr}\int \limits _{\xi ^2}^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] }{1-t}dt \ge \xi \left( \mathrm {D}(X||G) +\mathrm {D}(Y||G)\right) . \end{aligned}$$
(28)

We are finally ready to prove the main theorem.

Proof of Theorem 2

Denote \(\xi = \frac{1}{(2\mathrm {C_p}+1)}\frac{\min (\sigma _Y^2,\sigma _X^2)}{3}\). Since X and Y are log-concave, by Lemma 3, \(\varGamma _t^X, \varGamma _t^Y \preceq \frac{1}{t}\mathrm {I}_d\) almost surely. Thus, Lemma 2 gives

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y) \ge \frac{\xi ^2\lambda (1-\lambda )}{2}\int \limits _{\xi ^2}^1 \frac{\mathrm {Tr}\left( {\mathbb {E}}\left[ (\varGamma _t^X - \varGamma _t^Y)^2\right] \right) }{1-t}dt. \end{aligned}$$

By noting that \(\mathrm {C_p} \ge 1\), the bound (28) gives

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y)&\ge \frac{\xi ^3\lambda (1-\lambda )}{2}\left( \mathrm {D}\left( X||G\right) +\mathrm {D}\left( Y||G\right) \right) \\&\ge K\lambda (1-\lambda )\left( \frac{\min (\sigma _Y^2,\sigma _X^2)}{\mathrm {C_p}}\right) ^3\left( \mathrm {D}\left( X||G\right) + \mathrm {D}\left( Y||G\right) \right) , \end{aligned}$$

for some numerical constant \(K>0\). \(\square \)

5 Further results

5.1 Stability for low entropy log-concave measures

In this section we focus on the case where X and Y are log-concave and isotropic. Similar to the previous section, we set \(\xi _X = \frac{1}{3(2\mathrm {C_p}(X) + 1)}\), so that by Corollary 3,

$$\begin{aligned} {\mathbb {E}}\left[ \varGamma _t^X\right] \succeq \xi _X\mathrm {I}_d. \end{aligned}$$

Towards the proof of Theorem 3, we first need an analogue of Lemma 7, for which we sketch the proof here.

Lemma 8

If X is centred and has a finite Poincaré constant \(\mathrm {C_p}(X) < \infty \),

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_{\xi _X}\right\Vert ^2_2\right] < \frac{1}{4}\mathrm {D}(X||G). \end{aligned}$$

Proof

Assume by contradiction that \({\mathbb {E}}\left[ \left\Vert v_{\xi _X}\right\Vert _2^2\right] \ge \frac{1}{4}\mathrm {D}(X||G)\). In this case, Lemma 5 implies, for every \(t \ge \xi _X\),

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] \ge \mathrm {D}(X||G)\cdot \frac{t\left( (\mathrm {C_p}(X) -1)\xi _X + 1\right) }{4\left( (\mathrm {C_p}(X) -1)\xi _Xt + \xi _X\right) }. \end{aligned}$$

A calculation then shows that

$$\begin{aligned} \int \limits _{\xi _X}^1{\mathbb {E}}\left[ \left\Vert v^X_t\right\Vert _2^2\right] dt \ge \mathrm {D}(X||G), \end{aligned}$$

which is a contradiction to (6). \(\square \)

Proof of Theorem 3

Since \(v_t^X\) is a martingale, \({\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] \) is an increasing function. By (6) we deduce the elementary inequality

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_s^X\right\Vert _2^2\right] \le \frac{1}{1-s}\int \limits _0^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert ^2_2\right] dt = \frac{2\mathrm {D}(X||G)}{1-s}, \end{aligned}$$

which holds for every \(s \in [0,1]\). For isotropic X, Equation (17) shows that, for all \(t \in [0,1]\),

$$\begin{aligned} (1-t){\mathbb {E}}\left[ \left\Vert v^X_t\right\Vert _2^2\right] =\mathrm {Tr}\left( \mathrm {I}_d - {\mathbb {E}}\left[ \varGamma ^X_t\right] \right) \le 2\mathrm {D}(X||G) \le \frac{1}{2}, \end{aligned}$$

where the second inequality is by assumption. Note that Equation (17) also shows that \({\mathbb {E}}\left[ \varGamma _t^X\right] \preceq \mathrm {I}_d\) which yields, for every \(t \in [0,1]\)

$$\begin{aligned} 0 \preceq \mathrm {I}_d - {\mathbb {E}}\left[ \varGamma ^X_t\right] \preceq \frac{1}{2}\mathrm {I}_d. \end{aligned}$$

Applying this to Y as well produces the bound

$$\begin{aligned} 2\mathrm {Tr}\frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] {\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t}&\le \frac{1}{2}\mathrm {Tr}\left( \frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^Y\right] }{1-t}\right) + \frac{1}{2}\mathrm {Tr}\left( \frac{{\mathbb {E}}\left[ \mathrm {I}_d-\varGamma _t^X\right] }{1-t}\right) \\&= \frac{1}{2}\left( {\mathbb {E}}\left[ \left\Vert v^X_t\right\Vert _2^2\right] +{\mathbb {E}}\left[ \left\Vert v^Y_t\right\Vert _2^2\right] \right) . \end{aligned}$$

Set \(\xi = \min (\xi _X,\xi _Y)\). Repeating the same calculation as in (23) and using the above gives that

$$\begin{aligned}&\mathrm {Tr}\int \limits _{\xi }^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] }{1-t}dt\ge (1-\xi ) \left( \mathrm {D}(X||G) + \mathrm {D}(Y||G)\right) \nonumber \\&\quad - (1-\xi )\left( {\mathbb {E}}\left[ \left\Vert v_{\xi }^X\right\Vert _2^2\right] +{\mathbb {E}}\left[ \left\Vert v_{\xi }^Y\right\Vert _2^2\right] \right) . \end{aligned}$$

Lemma 8 implies

$$\begin{aligned} \mathrm {Tr}\int \limits _{\xi }^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] }{1-t}dt\ge & {} \frac{3}{4}(1-\xi )\left( \mathrm {D}(X||G) + \mathrm {D}(Y||G)\right) \\\ge & {} \frac{1}{2}\left( \mathrm {D}(X||G) + \mathrm {D}(Y||G)\right) . \end{aligned}$$

Finally, by Lemma 3, \(\varGamma _t^X,\varGamma _t^Y \preceq \frac{1}{t}\mathrm {I}_d\) almost surely for all \(t \in [0,1]\). We now invoke Lemma 2 to obtain

$$\begin{aligned} \delta _{EPI, \lambda }(X,Y)&\ge \frac{\lambda (1-\lambda )}{2\xi }\mathrm {Tr}\int \limits _{\xi }^1\frac{{\mathbb {E}}\left[ \left( \varGamma _t^X - \varGamma _t^Y\right) ^2\right] }{1-t}dt\\&\ge \frac{\lambda (1-\lambda )}{4\xi }\left( \mathrm {D}(X||G) + \mathrm {D}(Y||G)\right) . \end{aligned}$$

\(\square \)

5.2 Stability under convolution with a Gaussian

Proof of Theorem 4

Fix \(\lambda \in (0,1)\), by (7) we have that

$$\begin{aligned} \sqrt{\lambda }\left( \sqrt{\lambda }X_1 + \sqrt{1-\lambda }G\right) {\mathop {=}\limits ^{d}} B_\lambda + \int \limits _0^\lambda v_t^Xdt. \end{aligned}$$

As the relative entropy is affine invariant, this implies

$$\begin{aligned} \mathrm {D}\left( \sqrt{\lambda }\left( \sqrt{\lambda }X_1 + \sqrt{1-\lambda }G\right) \Big |\Big |\sqrt{\lambda } G\right)= & {} \mathrm {D}\left( \sqrt{\lambda }X_1 + \sqrt{1-\lambda }G\Big |\Big |G\right) \nonumber \\= & {} \frac{1}{2}\int \limits _0^\lambda {\mathbb {E}}\left[ \left\Vert v^X_t\right\Vert _2^2\right] dt. \end{aligned}$$
(29)

Lemma 5 yields,

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] \ge {\mathbb {E}}\left[ \left\Vert v_\lambda ^X\right\Vert _2^2\right] \frac{\lambda \left( \mathrm {C_p}(X)-1\right) t + t}{\lambda \left( \mathrm {C_p}(X)-1\right) t + \lambda } \text { for } t \ge \lambda , \end{aligned}$$

and

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] \le {\mathbb {E}}\left[ \left\Vert v_\lambda ^X\right\Vert _2^2\right] \frac{\lambda \left( \mathrm {C_p}(X)-1\right) t + t}{\lambda \left( \mathrm {C_p}(X)-1\right) t + \lambda } \text { for } t \le \lambda . \end{aligned}$$

Denote

$$\begin{aligned} I_1 := \int \limits _{\lambda }^1\frac{\lambda \left( \mathrm {C_p}(X)-1\right) t + t}{\lambda \left( \mathrm {C_p}(X)-1\right) t + \lambda }dt \text { and } I_2 := \int \limits _{0}^\lambda \frac{\lambda \left( \mathrm {C_p}(X)-1\right) t + t}{\lambda \left( \mathrm {C_p}(X)-1\right) t + \lambda }dt. \end{aligned}$$

A calculation shows

$$\begin{aligned} I_1 = \frac{\left( \lambda \left( \mathrm {C_p}(X) - 1\right) + 1\right) \left( (1-\lambda )\left( \mathrm {C_p}(X) - 1\right) -\ln \left( \mathrm {C_p}(X)\right) + \ln \left( \lambda \left( \mathrm {C_p}(X)-1\right) +1\right) \right) }{\lambda \left( \mathrm {C_p}(X) - 1\right) ^2}, \end{aligned}$$

as well as

$$\begin{aligned} I_2 = \frac{\left( \lambda \left( \mathrm {C_p}(X) - 1\right) + 1\right) \left( \lambda (\mathrm {C_p}(X) - 1) - \ln \left( \lambda \left( \mathrm {C_p}(X)-1\right) +1\right) \right) }{\lambda \left( \mathrm {C_p}(X) - 1\right) ^2}. \end{aligned}$$

Thus, the above bounds give

$$\begin{aligned} \mathrm {D}(X||G)&=\frac{1}{2}\int \limits _{0}^1{\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt \ge \frac{1}{2}\int \limits _{0}^\lambda {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt + \frac{{\mathbb {E}}\left[ \left\Vert v_\lambda ^X\right\Vert _2^2\right] }{2}I_1, \end{aligned}$$

and

$$\begin{aligned} 0 \le \frac{1}{2}\int \limits _{0}^\lambda {\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] dt \le \frac{1}{2}I_2. \end{aligned}$$

Now, since the expression \(\frac{\alpha }{\alpha + \beta }\) is monotone increasing with respect to \(\alpha \) and decreasing with respect to \(\beta \) whenever \(\alpha ,\beta > 0\), those two inequalities together with (29) imply that

$$\begin{aligned} \mathrm {D}\left( \sqrt{\lambda }X + \sqrt{1-\lambda }G\Big |\Big |G\right) ~&\le \frac{I_2}{I_1+I_2} \mathrm {D}(X||G) \\&= \frac{\lambda \left( \mathrm {C_p}(X) - 1\right) - \ln \left( \lambda \left( \mathrm {C_p}(X)-1\right) +1\right) }{\mathrm {C_p}(X) -\ln \left( \mathrm {C_p}(X)\right) - 1}\mathrm {D}(X||G). \end{aligned}$$

Rewriting the above in terms of the deficit in the Shannon–Stam inequality, we have established

$$\begin{aligned} \delta _{EPI, \lambda }(X,G)&= \lambda \mathrm {D}(X||G) - \mathrm {D}\left( \sqrt{\lambda }X + \sqrt{1-\lambda }G\Big |\Big |G\right) \\&\ge \left( \lambda - \frac{\lambda \left( \mathrm {C_p}(X) - 1\right) - \ln \left( \lambda \left( \mathrm {C_p}(X)-1\right) +1\right) }{\mathrm {C_p}(X) -\ln \left( \mathrm {C_p}(X)\right) - 1}\right) \mathrm {D}(X||G). \end{aligned}$$

\(\square \)