Abstract
Consider \((X_{i}(t))\) solving a system of N stochastic differential equations interacting through a random matrix \({\mathbf {J}} = (J_{ij})\) with independent (not necessarily identically distributed) random coefficients. We show that the trajectories of averaged observables of \((X_i(t))\), initialized from some \(\mu \) independent of \({\mathbf {J}}\), are universal, i.e., only depend on the choice of the distribution \(\mathbf {J}\) through its first and second moments (assuming e.g., sub-exponential tails). We take a general combinatorial approach to proving universality for dynamical systems with random coefficients, combining a stochastic Taylor expansion with a moment matching-type argument. Concrete settings for which our results imply universality include aging in the spherical SK spin glass, and Langevin dynamics and gradient flows for symmetric and asymmetric Hopfield networks.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
1 Introduction
Markov processes with random coefficients arise in numerous contexts: e.g., dynamics of spin glasses, optimization on random landscapes, and learning with neural networks. In many cases, when the underlying randomness is Gaussian, they have been found to give rise to a rich class of behaviors, including metastability, trapping, and aging. In this paper, we analyze a class of stochastic differential systems (SDS’s) in their high dimensional limit, where the couplings are linear and encoded by a random matrix. We show that trajectories of polynomial statistics of the SDS are universal: they have the same high-dimensional behavior if one replaces the Gaussian interaction matrix by a non-Gaussian one with the same mean and variance profiles.
Universality, can broadly be described as the phenomenon that for high dimensional ensembles \((X_i)_{i\le N}\) governed by a large number of independent random variables \((Z_i)_{i\le N}\), macrocopic statistics of the ensemble only depend on the laws of \((Z_i)\) through their low moments. Of course, the most classical example of universality is the central limit theorem (CLT), where \((X_i)=(Z_i)\), and the statistic is the normalized sum. Slightly more involved examples are invariance principles, where the limiting Brownian motion only depends on the distribution of the random walk increments through its first and second moments.
Lindeberg’s classical proof of the CLT iteratively replaces \(Z_i\) with \({{\tilde{Z}}}_i\) (Gaussian with the same mean and variance) and shows that the cumulative effect of these replacements is microscopic. This approach has proven to be very robust, and has been generalized e.g., to polynomials \(f(Z_1,\ldots ,Z_N)\) in [29, 34] and more generally, smooth functions with bounded derivatives in [8, 9]. A more combinatorial approach is a moment matching argument to compare moments of statistics \(f(X_1,\ldots ,X_N)\) to moments of \(f({\tilde{X}}_1,\ldots ,{\tilde{X}}_N)\) and showing that the difference is dominated by the differences in the first few moments of \(Z_i\) and \({\tilde{Z}}_i\).
With these approaches, universality has been proven in a wide range of ensembles where the relationship between \((X_i)\) and \((Z_i)\) is more complicated. A fundamental example is when \((X_i)\) are the eigenvalues of a random matrix with entries \((Z_i)\). There, the empirical distribution of \((X_i)\) is well-known to have the same limit (e.g., the semi-circle law for Wigner matrices [40]). In the last decade, remarkably, universality has been found to extend to local statistics of the ensemble \((X_i)\) e.g., typical size of gaps between eigenvalues, and k-point correlations. Universality in random matrix theory has been a tremendous success and we cannot hope to do justice to the literature therein; we instead refer to the seminal works [19, 37] and the surveys [20, 38].
Another class of ensembles for which universality has been shown is disordered interacting particle systems from statistical physics, and in particular the family of mean-field spin glass models. A canonical example of these are spin glasses where N particles in states \((X_i)\), interact through a random symmetric coupling matrix (or in the case of higher order interactions, tensor) composed of independent entries \(Z_i\). More precisely, with these interactions, they are endowed with an energy landscape, or Hamiltonian, that is topologically complex, and \((X_i)\) are drawn from the corresponding Gibbs distribution. The statistics of \((X_i)\) in such families of spin glasses have been found to exhibit an extremely rich and varied phase diagram featuring phenomena like breaking of ergodicity and replica symmetry [33]. Most of their analysis, including the calculation of the free energy, and the proof of the celebrated Parisi formula for the overlap distribution, were first carried out in the Gaussian setting [22, 32, 36]. Talagrand later showed that these also held in the case of Bernoulli \((Z_i)\) in [35]; this universality was extended to general \((Z_i)\) as an application of [9].
The dynamics (Markov processes exploring the Hamiltonian) for such spin glass models are a prototype and motivating force for this paper. The general setting we consider here is that of a system of N linearly coupled SDE’s, where the couplings are encoded in a random matrix \({\mathbf {J}}\), and driven by N independent Brownian motions. That is, \({\mathbf {X}}_t = (X_1(t),\ldots ,X_N(t))\) is the solution to the SDS
where \({\mathbf {J}}\) is a random matrix with independent entries (up to, possibly, a symmetry constraint) and variance profile \({\mathbf {m}}= (m_{ij})_{i,j}\) scaled such that \({\mathbb {E}}[\Vert {\mathbf {J}}\Vert _2] =O(1)\), \({\mathbf {h}}\) is a bounded drift vector, and \({\varvec{\Sigma }}\) is an affine transform of \({\mathbf {X}}_t\). Note that for \({\varvec{\Sigma }}({\mathbf {X}}_t)\) non-constant, we do not expect to have an explicit closed-form solution to (1.1).
In the \(N\rightarrow \infty \) limit, the diffusions of (1.1) encompass many interesting and well-studied models of Markov processes with random coefficients, and give rise to rich and varied behavior. This includes metastability, aging, and non-Markovian limiting evolution equations, in e.g., randomly coupled (geometric) Brownian motions, and Langevin dynamics and gradient flows for the spherical Sherrington–Kirkpatrick (SK) spin glass and symmetric and asymmetric Hopfield nets [6, 13, 25,26,27]: concrete applications are described in Sect. 1.4. In many such examples, the analysis is more tractable when \({\mathbf {J}}\) is Gaussian and one can use tools like Gaussian integration by parts, Girsanov, and the rotational invariance of the Gaussian ensemble.
In this paper, we develop a simple combinatorial framework for proving universality for the solution trajectories of SDS’s of the form (1.1). Before describing our approach, we explain a few difficulties one encounters when trying to prove universality for solutions of randomly coupled dynamical systems, using some of the approaches described above for other universality results. We begin by considering a Lindeberg approach where we examine the effect that re-sampling one \(J_{ij}\) has on an averaged statistic \(F(t) = F(X_1(t),\ldots ,X_N(t))\). The obstacle in employing such an approach is that changing \(J_{ij}\) to \({\tilde{J}}_{ij}\) on \(X_j(t)\), say, beyond affecting the drift
of the j-th coordinate of the SDS, also induces a highly non-linear effect both on \(X_j(t)\) and on \(X_i(t)\) for all \(i\ne j\). The problem instead lends itself to comparing the effect of \({\mathbf {J}}\rightarrow {\tilde{{\mathbf {J}}}}\) in a more averaged way.
An alternative approach would be to use the linear structure of the problem in a strong way, relying on sharp universality results on the spectra of random matrices to study the problem. This approach, while feasible if \({\varvec{\Sigma }}({\mathbf {X}}_t)\) is constant, requires one to diagonalize the problem without loss of generality—i.e., it requires an assumption of joint rotational invariance for the laws of \(({\mathbf {X}}_0, {\mathbf {J}},{\mathbf {B}})\). In [2], such an approach is followed for analyzing the dynamics of the spherical SK model, and their results hold assuming the law of \({\mathbf {J}}\) is invariant under the orthogonal group, and its spectrum satisfies certain large deviation estimates satisfied by the GOE. However, this restriction would not include the cases of e.g., the uniform measures on \([-1,1]^N\) and \(\{\pm 1\}^N\) absent the rotational symmetry, and could not include the case of non-constant \({\varvec{\Sigma }}({\mathbf {X}}_t)\).
Very recently, [17] proved a universality result for the dynamics of the asymmetric Langevin dynamics for the soft-spin SK model. There they used large deviations theory to obtain exponential control on the empirical measure on sample paths—as obtained in the Gaussian setting in [6, 7]—together with sharp control via Girsanov’s theorem on the Radon–Nikodym derivative between the Gaussian paths and those driven by non-Gaussian \({\mathbf {J}}\) on short time scales, to show universality for the empirical measure \({\mathcal {L}}_N = \frac{1}{N}\sum _{i} \delta _{X_i(t)}\). While such an approach allows for a deterministic non-linearity in the drift through a (double-well) confining potential, it cannot handle degenerate diffusions, e.g. the gradient flow. Further, the need for control on the trajectories at the exponential scale forces [17] to consider only asymmetric i.i.d. \({\mathbf {J}}\) (whereby the Radon–Nikodym derivative is a product of functions of independent rows of \({\mathbf {J}}^T\)).
We introduce a simple combinatorial approach to proving universality for SDS’s of the form of (1.1), similar in flavor to the moment method. Namely, we avoid the inherent difficulty of the problem, that the transformation \({\mathbf {J}}\rightarrow {\tilde{{\mathbf {J}}}}\) affects \(X_j(t)\) through both \((J_{ij})_i \rightarrow ({\tilde{J}}_{ij})_i\) and \((X_i(t))_i \rightarrow ({\tilde{X}}_i(t))_i\). We do so by Taylor expanding the semigroup \(P_t f= {\mathbb {E}}_{{\mathbf {X}}_0} [f({\mathbf {X}}_t)]\) in powers of the infinitesimal generator: each term appearing in this expansion is a polynomial in \((x_i),(J_{ij})\) evaluated at \({\mathbf {X}}_0\) where, crucially, the initial data is independent of \(J_{ij}\). One then finds that on order one timescales, the predominant contribution to \({\mathbb {E}}[P_t f]\) is from polynomials whose degree in \((J_{ij})_{i,j}\) is at most two. We refer to Sect. 1.3 for more details.
This approach works quite generally, and is robust to symmetric and asymmetric choices of \({\mathbf {J}}\) with non-homogenous means and variances, and general choices of diffusion coefficients in (1.1), including \({\varvec{\Sigma }}({\mathbf {X}}_t)\) non-constant making the diffusion non-linear, and \({\varvec{\Sigma }}\equiv 0\) corresponding to a deterministic dynamical system. Lastly, the analysis works for arbitrary initialization independent of \({\mathbf {J}}\). The assumption of linear drift is, of course, important, and one would like to be able to drop it. We emphasize, though, that this is primarily used in order to justify the absolute convergence of the Taylor expansion of the semigroup, which one could hope to justify by other means for higher order diffusions given that a strong solution exists; the remaining combinatorial framework for moments of the generator may then generalize. We discuss this in Remark 1.5.
We end this section by mentioning two recent results [1, 10] showing universality for a Lipschitz family of approximate message passing (AMP) algorithms—a discrete-time state evolution that has found many applications to inference and optimization in high dimensions. Some of the ideas there appear similar in spirit to our approach, using a combinatorial approach to control moments of the final state of the AMP. All the same, the general setting of (1.1) introduces many key differences e.g., the diffusions of (1.1) are in general non-linear, not globally Lipschitz, and have a built-in stochasticity.
1.1 Setup: diffusions with random linear interactions
Consider an N-dimensional stochastic differential system with a mixture of random and deterministic linear interactions, along with possibly, some constant drifts. More precisely, consider the SDS \({\mathbf {X}}_t := (X_{i}(t))_{i=1}^{N}\) driven by the following parameters.
Suppose that for some matrix \({\mathbf {m}}= (m_{ij})_{i,j}\) we have random interactions given by the random matrix
We assume that the entries \(A_{ij}\) are either fully independent, or are independent up to a symmetry constraint \(A_{ij} = A_{ji}\). Let \({\mathbb {P}}_{{\mathbf {A}}}\) be the law of \({\mathbf {A}}\). In order to scale the interactions to have an order one cumulative effect, it will be convenient to work with the rescaled interactions matrix \({\mathbf {J}}\) given by
We then denote the distribution induced by \({\mathbb {P}}_{\mathbf {A}}\) on \({\mathbf {J}}\) by \({\mathbb {P}}_{{\mathbf {J}}}\).
We further consider additional deterministic interactions satisfying, for some constant \(C_{{\varvec{\Lambda }}}<\infty \),
(the \(\Vert \cdot \Vert _0\)-norm of a vector is its number of non-zero entries). We also consider external drift parameters
and diffusion coefficients \({\varvec{\Sigma }}({\mathbf {X}}_t)\) governed by the matrix
The SDS \(({\mathbf {X}}_t)_{t\ge 0}=(X_{1}(t),X_{2}(t),\ldots ,X_{N}(t))_{t\ge 0}\) initialized from some random \({\mathbf {X}}_0\) distributed according to a product measure \(\mu \) is driven by a standard Brownian motion \({\mathbf {B}}_t = (B_1(t),\ldots ,B_N(t))\) as follows
where for ease of notation, we hereon set \(X_0(t) \equiv 1\) so that \((\sigma _{0j})_{j\ge 1}\) capture the constant diffusion coefficients. We denote the martingale part of \({\mathbf {X}}_t\) by
The process \({\mathbf {X}}_t\) is well-defined for a.e. \({\mathbf {J}}\) and all \(t \ge 0\) (as we have finite, possibly N-dependent operator norms \(\Vert {\mathbf {J}}\Vert _{2}\), \(\Vert {\varvec{\Lambda }}\Vert _2\) and \(\Vert (\sigma _{ij})_{i \ge 1}\Vert _2\), see e.g., [31, Theorem 5.2.1]).
Notational comment There are three distinct sources of randomness above dictating the law of the solution \({\mathbf {X}}_t\) to (1.2): the law of the interaction matrix \({\mathbb {P}}_{{\mathbf {J}}}\), the law of the Brownian motions, denoted \({\mathbb {P}}_{{\mathbf {B}}}\), and the law of the initial data \(\mu \)—each of these are product measures and we do not distinguish notationally between the law of the individual entries of \({\mathbf {J}},{\mathbf {B}}\) or \({\mathbf {X}}_0\) and the ensembles.
In proving universality, we consider the difference between \({\mathbb {P}}_{{\mathbf {J}}}, {\mathbb {P}}_{{\tilde{{\mathbf {J}}}}}\) induced by two different distributions \({\mathbb {P}}_{{\mathbf {A}}}\) and \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) over mean-zero random matrices \({\mathbf {A}}, {\tilde{{\mathbf {A}}}}\) with independent entries (possibly up to symmetry), having matching variance profiles \({\mathbf {m}}= {\tilde{{\mathbf {m}}}}\). For ease of notation, we will henceforth use
and denote the corresponding expectations \({\mathbb {E}}\) and \({\tilde{{\mathbb {E}}}}\) respectively.
1.2 Main results
We begin by describing the observables to which our universality results apply. The building blocks of these observables are chosen among the family of vector valued functions,
We establish universality in the mean for weighted empirical averages of monomials in functions from \({{\mathfrak {F}}}\) evaluated at a finite collection of times. Specifically, fixing an m-tensor \({\mathbf {a}}= (a_{i_1,\ldots ,i_m})\) with entries bounded by \(C_{\mathbf {a}}\) and a p-tuple of times \({\mathbf {t}}= (t_1,\ldots ,t_p)\), for every \(\ell \le m\), fix p observables \({\mathcal {Y}}^{(\ell ,1)},\ldots ,{\mathcal {Y}}^{(\ell ,p)} \in {{\mathfrak {F}}}\) which are to be evaluated at these p times. That is,
We also need to add a sub-exponential tail constraint on \(\mu \) and \({\mathbb {P}}_{\mathbf {A}}\) beyond the minimal assumptions of zero-mean and matching variances of \({\mathbb {P}}_{{\mathbf {A}}}\) and \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\); this is henceforth referred to as Hypothesis 1.
Hypothesis 1
Assume that the law \(\mu \) is a product of \(\mu _i\) of \(X_i(0)\) having finite moments of all order, which are bounded uniformly over i and N. That is, there exist \(C_\mu (r) \ge 1\) such that for any r finite,
Further assume \({\mathbb {P}}_{\mathbf {A}}\) has uniformly bounded exponential tails, i.e., the following equivalent properties hold:
For ease of notation for dependencies on constants, we denote by \({\mathbf {C}}_\star := \max \{C_{{\mathbf {A}}}^{1/2},C_{\tilde{{\mathbf {A}}}}^{1/2},C_{\varvec{\Lambda }},C_{\mathbf {h}}, C_{\varvec{\sigma }}^2 \}\) (where \(C_{{\tilde{{\mathbf {A}}}}}\) is the constant \(C_{\mathbf {A}}\) with respect to distribution \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\)), and state our first result, on universality at the level of the mean (hence also of moments), for observables (1.5).
Theorem 1
Let \(\mu , {\mathbb {P}}_{\mathbf {A}}, {\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) satisfy Hypothesis 1 and suppose that \({\mathbf {A}}, {\tilde{{\mathbf {A}}}}\), symmetric or independent, are mean-zero of matching variance profile \({\mathbf {m}}= (m_{ij})_{i,j}\). For any \(T,m,p<\infty \) and \({\mathbf {a}}\in {\mathbb {R}}^{N^m}\) with \(\Vert {\mathbf {a}}\Vert _\infty \le C_{{\mathbf {a}}}\), there exists \(C (T,m,p,C_{\mathbf {a}}, {\mathbf {C}}_\star ,C_\mu )<\infty \), such that for every N and F as in (1.5) with \(({\mathcal {Y}}^{(\ell ,1)},\ldots , {\mathcal {Y}}^{(\ell , p)}) \in {{\mathfrak {F}}}\),
In particular, \(\big | {\mathbb {E}}[ F({\mathbf {t}})] - {\tilde{{\mathbb {E}}}}[ F({\mathbf {t}})] \big | \rightarrow 0\) as \(N \rightarrow \infty \), uniformly in \({\mathbf {t}}\in [0,T]^p\).
Theorem 1 follows from a more general result bounding the difference in expectations for each individual monomial \(F_i^{(\ell )}\) from (1.5) with \(({\mathcal {Y}}^{(\ell ,1)},\ldots , {\mathcal {Y}}^{(\ell , p)}) \in {{\mathfrak {F}}}\). As a special case, see Proposition 2.1, we find that the moments of each spin \(X_i(t)\) are universal. Specifically, for every fixed k,
For a more restricted class of observables, with additional restrictions on the distributions \(\mu \) and \({\mathbb {P}}_{{\mathbf {A}}}\) and \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\), we extend the above to almost sure and \(L^q\) convergence for the observable trajectories. Precisely, we restrict the observables of (1.5) to \(m=1\) and \(p=2\), leaving, the following quadratic observables
In order to extend Theorem 1 to a convergence for the trajectories of these observables, we further need to assume that \({\varvec{\Sigma }}\) is constant, so that \({\mathbf {M}}_t\) is just a scaled Brownian motion, and assume the following concentration property on \(\mu , {\mathbb {P}}_{\mathbf {A}}, {\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\), which we refer to as Hypothesis 2.
Hypothesis 2
A sequence of probability measures \(({\mathbb {P}}^{(n)})_{n\ge 1}\) over \({\mathbf {Z}}_n\) in metric spaces \(({\mathcal {X}}_n,d)\) satisfies exponential concentration for Lipschitz functions if there exists some \(C>0\) such that for any sequence of 1-Lipschitz functions \(f_n: ({\mathcal {X}}_n, d) \rightarrow ({\mathbb {R}}, |\cdot |)\) and all \(\lambda >0\),
Assume that \(\mu ,{\mathbb {P}}_{{\mathbf {A}}}\) respectively satisfy exponential concentration for Lipschitz functions on \({\mathbb {R}}^N\) and \({\mathbb {R}}^{N^2}\) (or \({\mathbb {R}}^{N(N+1)/2}\) if \({\mathbf {A}}\) is symmetric), equipped with their Euclidian norms, for some \(C_\mu ,C_{\mathbf {A}}>0\).
Remark 1.1
Recall, from the theory of measure concentration, that Hypothesis 2 holds for any distribution on \({\mathbb {R}}^n\) which satisfy a Poincaré inequality with constant \(c>0\) (independent of n), namely for all nice f one has that \(\text{ Var } [ f({\mathbf {Z}}_n) ] \le c {\mathbb {E}}[ |\nabla f({\mathbf {Z}}_n)|^2]\) (see [21]). By the tensorization of the Poincaré inequality, if \({\mathbf {Z}}_n = (Z_1,\ldots ,Z_n)\), and each of the laws of \(Z_i\) satisfy this inequality, then the product also satisfies it with the worst constant c. Having here product measures \(\mu , {\mathbb {P}}_{\mathbf {A}}\), the marginal laws can come from any distribution satisfying a Poincaré inequality in \(n=1\). These include (see e.g., [39])
-
Exponential, Gaussian, and log-concave measures of the form \(\exp (- V(x))\) for V(x) strictly convex,
-
Linear functionals of r.v.’s having a Poincaré inequality: e.g., the uniform measure on \([-1,1]\).
The next theorem shows that under Hypothesis 2, any F of the form (1.10) concentrates around its mean.
Theorem 2
Suppose \(\mu \), \({\mathbb {P}}_{\mathbf {A}}\) satisfy Hypotheses 1–2 and the diffusion coefficients have \(\sigma _{ij}=0\) if \(i\ne 0\). Then, for some \(C(T,C_{\mathbf {a}}, {\mathbf {C}}_\star , C_\mu )>0\), any \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\), every F as in (1.10) with \({\mathcal {Y}},{\mathcal {Y}}'\in {{\mathfrak {F}}}\), all \(\lambda >0\) and \(N \ge N_0(T,C_{\mathbf {a}},{\mathbf {C}}_\star ,C_\mu )\),
(One might observe that the \(\exp ( - \Omega (\sqrt{N}))\) concentration in (1.12) differs from the more traditional \(\exp ( - \Omega (N))\) concentration in e.g. [2, 3]; such differences, which recur throughout the paper, are because our Hypothesis 2 allows for merely sub-exponential, as opposed to Gaussian, tails.) Combining Theorems 1 and 2 we get the following strong universality for such quadratic observables.
Corollary 3
Suppose \(\mu , {\mathbb {P}}_{\mathbf {A}}, {\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) satisfy Hypotheses 1–2, where \({\mathbf {A}}, {\tilde{{\mathbf {A}}}}\), symmetric or independent, are mean-zero and have matching variance profile \({\mathbf {m}}= (m_{ij})_{i,j}\). Let \(F(\cdot )\) and \({\tilde{F}}(\cdot )\) be as in (1.10), for \({\mathbf {a}}\in {\mathbb {R}}^{N}\) such that \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\), with respect to the corresponding solutions \({\mathbf {X}}_t\), \({\tilde{{\mathbf {X}}}}_t\) for (1.2) with constant \({\varvec{\Sigma }}\), i.e., \(\sigma _{ij}=0\) if \(i \ne 0\). Then, for every \(T<\infty \) we have that as \(N \rightarrow \infty \),
Proof
The observables of (1.10) correspond to the \(m = 1\) and \(p=2\) case of (1.5), so Theorem 1 applies here with some constant \(C_1=C(T,m,p,C_{\mathbf {a}}, {\mathbf {C}}_\star , C_\mu )\). For \(N \ge (\lambda /C_1)^2\) we then get upon combining the triangle inequality with Theorems 1–2, that
Since \(\sum _N p_N(\lambda ) < \infty \) for any fixed \(\lambda >0\), by Borel-Cantelli \(Z_N {\mathop {\rightarrow }\limits ^{a.s.}} 0\) as \(N \rightarrow \infty \). Similarly, upon using the triangle inequality for \(\Vert \cdot \Vert _q\) we get from Theorems 1 and 2 that
Further, \(N \mapsto p_N(\cdot )\) decrease pointwise on \([C,\infty )\), while for any \(q\ge 1\), the preceding integral is finite for all N large enough. With \(\{Z_N^q\}_N\) uniformly integrable, it follows that \(Z_N \rightarrow 0\) also in \(L^q\). \(\square \)
1.3 Proof strategy
As mentioned in the introduction, traditional approaches to proving universality run into substantial difficulty when we apply them to diffusions with random coefficients. The dependence on specific entries of the random matrix are quite bad, as the dependence applies in the drift both through the \(J_{ij}\), and through its effect on \({\mathbf {X}}_t\), whose history evidently also depends on \(J_{ij}\): this effect can exponentially amplify small differences; in fact, the exponential amplification is inherent to the problem at hand.
At a high level, our strategy for proving Theorem 1, and the main novelty of the paper, is to leverage the independence of \(\mu \) from \({\mathbb {P}}_{{\mathbf {J}}},{\mathbb {P}}_{{\tilde{{\mathbf {J}}}}}\) by pulling back \(f({\mathbf {X}}_t)\) and \(f({\tilde{{\mathbf {X}}}}_t)\) to properties of (time) derivatives of \(f({\mathbf {X}}_t)\) evaluated at \(t=0\). At the level of expectations, these derivatives can be seen as iterates of the infinitesimal generator applied to the function F, which can then be controlled by combinatorial moment methods. The dominant contribution to the drift of F comes from drift terms that are polynomials of degree at most two in \((J_{ij})_{ij}\). Since the first two moments of \({\mathbb {P}}_{{\mathbf {A}}}\) and \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) match, these terms do not contribute to the difference in expectations above. We emphasize that the approach does not need rely on an explicit solution to the SDE of (1.2), nor does it use exponential control, or large deviations theory as in [17], or refined estimates on the spectrum of \({\mathbf {A}}\) as in the setting of [2] where, crucially, the process has a rotational symmetry.
Recall that the SDE defined in Eq. (1.2) has infinitesimal generator L that we split as follows (see e.g., [31, Theorem 7.3.3]):
By Ito’s formula, we have for every f, say, in \(C^\infty ({\mathbb {R}}^N)\),
where \(P_t = P_t ({\mathbf {J}})\) denotes the semi-group operator
in terms of the generator L. In order to reduce the problem to a combinatorial question, we wish to Taylor expand the semi-group operator \(P_t f = e^{tL}f\). As long as f is smooth and the Taylor expansion converges absolutely—shown in Sect. 2.2—this formal expansion is valid and we can switch expectations over \(\mu , {\mathbb {P}}_{\mathbf {J}},{\mathbb {P}}_{\tilde{{\mathbf {J}}}}\) with the sum, and compute expectations of powers of the generator L acting on f. Namely, the difference in expectations is bounded by controlling (1) the size in N, and (2) the growth in k of
Expanding these terms as words in \(L_{\mathbf {J}},L_{\varvec{\Lambda }},L_{\mathbf {h}},L_\Delta \), we observe that a non-zero difference between the two expectations in (1.15), can only come from the summands (monomials in \({\mathbf {J}},{\mathbf {X}},{\varvec{\Lambda }}, {\mathbf {h}},{\varvec{\sigma }}\)) satisfying
-
Every \(J_{ij}\) that is present, must appear at least twice.
-
At least one \(J_{ij}\) must appear at least three times.
This is because the means of \({\mathbb {P}}_{\mathbf {A}},{\mathbb {P}}_{\tilde{{\mathbf {A}}}}\) are zero, and the variances of \({\mathbb {P}}_{\mathbf {A}}\) and \({\mathbb {P}}_{\tilde{{\mathbf {A}}}}\) match. A careful analysis of this combinatorial problem for the monomials eventually yields that the contributions from these monomials are, together, \(O(N^{-1/2})\) in N, and o(k!) in k: this computation is carried out in Sect. 2.3.
Remark 1.2
One may notice that in the case where \({\varvec{\Sigma }}({\mathbf {X}}_t)\) is constant so that \({\mathbf {M}}_t\) is just a Brownian motion, we are left with a linear SDS and one could use this linearity in a more central way, to explicitly solve expectations of monomials in \((X_i(t))_i\) as Gaussian integrals and time integrals over words in \(e^{ s {\mathbf {J}}}\) and \((X_i(t))_i\). If the system \({\mathbf {X}}_t\) is invariant under rotations, then we can work in the coordinates of \({\mathbf {J}}\) so that it is diagonal and apply universality results for the spectrum of \({\mathbf {J}}\). Absent rotational symmetry, however, the natural step would be to Taylor expand \(e^{s{\mathbf {J}}}\), at which point the expansion and the resulting combinatorics will be similar, and perhaps less transparent, than our generator based approach. Of course, for non-constant \({\varvec{\Sigma }}({\mathbf {X}}_t)\) as in Theorem 1, the SDS is non-linear, and such an approach would not generalize.
In Sect. 3, we extend this bound on the difference in expectations of statistics f to multi-time observables, then to statistics that contain the driving martingale terms and finally establish the universality at the level of expectation for observables of the form of (1.5), as stated in Theorem 1. In Sect. 4, we adapt the approach of [3] to establish Theorem 2, namely, to show that the restricted class of observables of (1.10) concentrate around their expectations, by localizing to a set of large probability where F is \(O(N^{-1/2})\)-Lipschitz in the triplet \(({\mathbf {X}}_0,{\mathbf {J}}, ({\mathbf {M}}_t)_{t\in [0,T]})\) and using Hypothesis 2.
1.4 Applications
In this section, we discuss systems for which Theorem 1–Corollary 3 imply concrete universality results. All the examples that follow will be in the context of \({\varvec{\Sigma }}\) that is constant, i.e., \(\sigma _{ij}= 0\) if \(i\ne 0\), where both Theorems 1–2 apply. Among the examples with non-constant \({\varvec{\Sigma }}\), one which may be of interest is a system of geometric Brownian motions interacting linearly through \({\mathbf {J}}\).
We next describe two well-studied families of Markov processes/dynamical systems to which our results apply: Langevin dynamics and gradient flows on various energy landscapes (Hamiltonians) or loss functions.
1.4.1 Langevin dynamics
In the case where \({\mathbf {J}}\) and \({\varvec{\Lambda }}\) are symmetric matrices, and \(\sigma _{0j}\) are identically one, (1.2) corresponds exactly to the Langevin dynamics for the Hamiltonian
The linearity of the diffusion here corresponds to having a quadratic Hamiltonian. The Langevin dynamics is a reversible Markov process designed such that, when non-degenerate, its invariant measure on \({\mathbb {R}}^N\) is given by \(d\pi ({{\mathbf {x}}}) \propto e^{ - H({{\mathbf {x}}})} d{{\mathbf {x}}}\). For Hamiltonians coming from spin glass theory, the Langevin dynamics has been analyzed at length in the case of Gaussian disorder, and found to have a varied and rich behavior; in §1.4.1, we explore this further in the context of a simple spin glass model, called the spherical SK model.
1.4.2 Gradient flows
The case where \(\sigma _{0j}\) are identically zero—i.e., besides the randomness of \({\mathbf {J}}\) and, possibly, the initial data, the dynamics is deterministic— also fits into the framework of the paper. Here, given \({\mathbf {J}}\) and \({\mathbf {X}}_0\), the law of the dynamics is taken to be the delta function on the trajectory of the solution to the resulting system of ODE’s. This corresponds to the gradient flow on \(H({{\mathbf {x}}})\): in optimization and learning settings, e.g., the examples of Sects. 1.4.2–1.4.3, gradient descent and its many variants, are favored methods.
We now turn to a few well-studied concrete problems to which our results are applicable.
1.4.3 The (soft) spherical SK model
The dynamics of spin glasses are a canonical setting in which Markov processes with random coefficients are studied in their thermodynamic (\(N\rightarrow \infty \)) limit. The short-time (\(N\rightarrow \infty \), then \(T\rightarrow \infty \)) behavior of Langevin dynamics, especially, in the context of spin glasses have been extensively studied in both the physics and math literature [2,3,4,5,6,7, 11, 12, 15, 18]. Perhaps the most well-known mean field spin glass is the Sherrington–Kirkpatrick (SK) spin glass, where N spins taking values in \(\{+1,-1\}\) interact pairwise with one another, and their interaction strengths are moderated by “coupling” parameters \(J_{ij} = J_{ji}\) which are drawn i.i.d., say, Gaussian. We discuss a simplification of this known as the spherical SK model, which has been found to nevertheless exhibit some of the same phenomena.
Take an i.i.d. symmetric matrix \({\mathbf {J}}=(J_{ij})_{ij}\) with law \({\mathbb {P}}_{{\mathbf {J}}}\). The spherical SK model has Hamiltonian
To avoid differential geometry on the sphere, it is sometimes preferable to extend the Hamiltonian to all \({{\mathbf {x}}}\in {\mathbb {R}}^{N}\) (note that the Hamiltonian is homogeneous so that dividing \({{\mathbf {x}}}\) by the Euclidean norm \(\Vert {{\mathbf {x}}}\Vert /\sqrt{N}\) gives the same process on \({\mathbb {S}}^{N-1}(\sqrt{N})\)). Instead of adding a non-linear confining force as is done in, e.g., [2], we either add a linear confining force \(F_K(x) = K x\), or have no confinement (\(K = 0\)) (the linearity of the system ensures no finite time blowup). Consider now the Langevin dynamics at inverse temperature \(\beta >0\) for the Hamiltonian of (1.17), corresponding to \({\mathbf {X}}_t = {\mathbf {X}}_{t}^{(\beta )}\) solving the SDS
We also consider the gradient flow where we take \(\beta = \infty \), so that the Brownian motion term drops out: \({\mathbf {X}}_t\) is then the (deterministic) dynamical system following the (random) gradient vector field of \(H({{\mathbf {x}}}) + F_K(\Vert {{\mathbf {x}}}\Vert ^2/N)\). The following universality for the above system is an immediate corollary of Theorem 3.
Corollary 1.3
Fix \(\beta \in (0,\infty ]\) and consider the SDS’s \({\mathbf {X}}_t\) and \({\tilde{{\mathbf {X}}}}_t\) given by (1.18) for \({\mathbf {A}}\) and \({\tilde{{\mathbf {A}}}}\) having mean zero, matching variance profiles \(m_{ij} = {\mathbf {1}}\{i\ne j\}\). Suppose \(\mu \) is independent of \({\mathbb {P}}_{{\mathbf {A}}}, {\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) and these satisfy Hypotheses 1–2. Then for F as in (1.10) with \({\mathcal {Y}}, {\mathcal {Y}}'\in {{\mathfrak {F}}}\) and \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\), for every \(T<\infty \),
As shown in [14] and rigorously proved in [2], when \({\mathbf {J}}\) is Gaussian, the spherical SK model, or the soft spherical SK Model with confining potential F satisfying \(F(x)/x\rightarrow \infty \) as \(x\rightarrow \infty \), exhibits a sharp aging transition. Informally, aging is defined as the notion that the older a system gets, the more it remembers its past; formally, it corresponds to a transition in the behavior of the auto-correlation,
between a (FDT) regime where \(C_N(s,t) \sim \Phi (t-s)\) and an aging regime where \(C_N(s,t) \sim \Phi (\frac{t}{s})\) for large s, t. In [2], it was established that for \({\mathbf {J}}\) having rotationally invariant law, e.g., a GOE matrix, \(C_N(s,t)\) solves a non-linear equation [2, Eq. (2.16)], which exhibits exactly this type of transition at some \(\beta _{\mathrm{ag}}\). Our results allow us to read off universality for this limiting behavior, as formalized in the following corollary.
Corollary 1.4
Consider the Langevin dynamics for the soft spherical SK model, as defined in (1.18) where \({\mathbb {P}}_{\mathbf {A}}\) is a Wigner matrix satisfying Hypothesis 2, the confinement is \(F_K(x) = Kx\) for some \(K> {\mathbb {E}}[ \Vert {\mathbf {J}}\Vert _{2\rightarrow 2}]\), and the initialization \(\mu \) is e.g., standard Gaussian, independent of \({\mathbb {P}}_{\mathbf {A}}\). Then, for every \(\beta \in (0,\infty ]\) and every \(T<\infty \), the limit \((\lim _{N\rightarrow \infty } C_N(s,t))_{s,t\in [0,T]}\) exists, and satisfies [2, Eq. (2.16)].
In the specific case of \(\beta = \infty \), the conclusions of [2, §3.2.2] apply, and the solution exhibits aging: i.e., there is a \(\gamma >0\) (specified therein) such that for every \(\lambda >1\),
Proof
For the first statement, while [2, Theorem 2.6] is stated for confinement F growing super-linearly, following the proof one sees that it is only used to localize the process, for which it suffices for K to exceed \(\Vert {\mathbf {J}}\Vert _{2 \rightarrow 2}\) (which for Wigner matrices is a.s. less than \(2+\epsilon \) for any \(\epsilon >0\)). The first part of the corollary therefore follows from Corollary 1.3 together with the result of [2, Theorem 2.6] showing that for \({\mathbf {A}}\) standard normal, \(C_N(s,t)\) satisfies [2, (2.16)].
For concreteness, the analysis of the limiting equation [2, (2.16)] and the derivation of the aging transition is carried out in [2] only for a specific choice of quadratic F. One could in principle perform the same analyses with other choices of F including \(F=F_K\) that is linear, corresponding to the case we consider, and understand the limiting behavior of \(C_N(s,t)\) as \(N\rightarrow \infty \) then \(s,t\rightarrow \infty \) as \(\beta \) varies. We do not pursue this, and instead notice that in the specific case of \(\beta = \infty \), the homogeneity allows us to disregard the choice of the confining potential and obtain universality for the zero-temperature aging behavior. To see this, since H(x) is a homogeneous polynomial, if \(\beta = \infty \), we see that \(d{\mathbf {X}}_t\) is a constant multiple (for a constant depending only on \(\Vert {\mathbf {X}}_t\Vert \)) of \(d({\mathbf {X}}_t/ \Vert {\mathbf {X}}_t\Vert )\). Therefore, at \(\beta = \infty \), the projection of the dynamics (1.18) onto the sphere \({\mathbb {S}}^{N-1}(\sqrt{N})\) matches the projection of the Langevin SDS of [2], regardless of the choice of confining potential used therein. We apply Corollary 1.3 first to deduce that \(\lim _{s\rightarrow \infty } \lim _{N\rightarrow \infty } C_N(s,s)=: C_\infty \) is the same for Gaussian and non-Gaussian \({\mathbb {P}}_{{\mathbf {A}}}\). Then applying it to \(C_N(s,\lambda s)\), we find that the \(N\rightarrow \infty \) limit of the normalized auto-correlation is the same for Gaussian and non-Gaussian \({\mathbb {P}}_{{\mathbf {A}}}\), and it is further independent of the choice of confining potential: as such for any \({\mathbb {P}}_{{\mathbf {A}}}\), it has the same \(N\rightarrow \infty \) limit as in [2]. \(\square \)
Remark 1.5
It would be of interest to consider similar Langevin dynamics for the spherical or soft spherical p-spin glass models for \(p>2\). Permitting higher order interactions gives rise to a wealth of more complicated models and different behavior. At the level of the off-equilibrium Langevin dynamics, these lead to the famous Cugliandolo–Kurchan/Crisanti–Horner–Sommers limit of coupled integro-differential equations for \(C_N(s,t)\) and an integrated response \(\chi _N(s,t) = \frac{1}{N} \sum _i X_i(s) B_i(t)\) [3, 11, 12, 15, 16, 18, 23], as well as the evolution of other observables e.g., the Hamiltonian and its square gradient [5]. Our combinatorial framework suggests that the differences in expectations (over p-tensors \({\mathbf {J}}\) and \({\tilde{{\mathbf {J}}}}\)) of averaged observables are microscopic, as long as there is a non-linear confining potential to prevent finite-time blowup. The complication is in the fact that the two non-linearities (from the interactions, and the confining potential) cancel out, but these cancellations are not easily seen in the Taylor series obtained by expanding in powers of the generator; thus we are not able to show that this series is absolutely summable and exchange the infinite sum with its expectation.
1.4.4 Symmetric and asymmetric Hopfield networks
Let us also mention a different context in which diffusions of the form of (1.1) appear. Hopfield networks were introduced by [26] and have become one of the simplest and most fundamental examples of neural networks. In this model, a set of N neurons \((X_i)_i\) are either active \(\{+1\}\) or inactive \(\{-1\}\) depending on whether the neuron \(X_j\)’s input \(\sum J_{ij} X_i\), for some weights \({\mathbf {J}}= (J_{ij})_{i,j}\), exceeds a deterministic threshold \(h_i\). This model was introduced in the symmetric setting, but has since been analyzed extensively both in symmetric and asymmetric setups [13, 25, 41].
One typically initializes the neurons at some pre-determined state independent of \({\mathbf {J}}\), e.g., all inactive/active, or uniformly at random, and tracks their time-evolution, whereby each neuron activates/de-activates at some rate, depending on the relationship between its input and threshold. Though there are many ways this is implemented, one is to soften the problem to continuous state space, either to the sphere, or to full-space and add in stochasticity by running some Langevin dynamics. This is the approach pursued in [13] as well as e.g., [41]. Then, with a linear confining force, our results imply universality for both for the symmetric and asymmetric Langevin dynamics (and gradient flow) of general Hopfield networks: this includes universality for observables capturing the energy/loss in the network, its square gradient, and its “memory”.
1.4.5 Rayleigh quotient minimization for random matrices
We conclude with a related optimization problem in high dimensions: that of optimizing the Rayleigh quotient of a random matrix \({\mathbf {J}}\) with a certain mean and variance profile. Maximizing the Rayleigh quotient is an efficient way to find the top eigenvector and eigenvalue of the random matrix via local iteration, e.g., either gradient descent or Langevin dynamics at low temperatures (large \(\beta \)). To place this in the framework of (1.2), take \(H({{\mathbf {x}}}) = \langle {{\mathbf {x}}}, {\mathbf {J}}{{\mathbf {x}}}\rangle \) and either no confining force or \(F_K' = K\) for some \(K > \Vert {\mathbf {J}}\Vert _{2 \rightarrow 2}\) in (1.18). In the situation where the matrix ensemble is rotationally invariant, e.g., the GOE, the limiting trajectories of, say, \(H({\mathbf {X}}_t)\) for the gradient flow/Langevin dynamics can be explicitly solved (by diagonalization). Corollary 3 implies these limiting trajectories will be universal, and thus, match the limiting trajectories obtained when \({\mathbf {J}}\) is not Gaussian. In [1, 10], similar universality results were described for an AMP approach to finding the top eigenvalue/eigenvector of \({\mathbf {J}}\).
2 Universality of expectations of monomial observables
In this section, we prove that two solutions \({\mathbf {X}}\) and \({\tilde{{\mathbf {X}}}}\) of (1.2) driven by \({\mathbf {J}}\) and \({\tilde{{\mathbf {J}}}}\) are such that expectations of observables of the form (1.10) are universal, as long as \({\mathbf {A}}\) and \({\tilde{{\mathbf {A}}}}\) have the same variance profiles. As discussed in Sect. 1.3, we reduce differences in expectations to combinatorial calculations by expanding the Markov transition semi-group of the process \({\mathbf {X}}_t\) in terms of its generator, an approach for proving universality in randomly driven dynamical systems which is the key contribution of this paper.
For the entirety of this paper, we will take two distributions \({\mathbb {P}}_{{\mathbf {A}}}\) and \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) on \({\mathbf {A}}\) and \({\tilde{{\mathbf {A}}}}\) that are mean zero and have the same, uniformly bounded, variance profiles \({\mathbf {m}}= {\tilde{{\mathbf {m}}}}\). Recall that \({\mathbb {P}}_{{\mathbf {A}}}\) and \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) are either fully independent or symmetric ensembles. For conciseness, we present our results in the case of fully independent (in particular, not symmetric). The case where they are symmetric is handled mutatis mutandis and only induces a few constant factors in certain estimates (see Remark 2.8 for more on these minimal modifications).
2.1 Main result on difference in expectations
The observables in Theorem 1 are composed of polynomials in \({\mathbf {J}}\) and \({\mathbf {X}}\), as well as \({\mathbf {M}}\). We first establish the universality of expectations for general monomials in \({\mathbf {J}}\) and \({\mathbf {X}}\) via a combinatorial moment matching type of argument. In Sect. 3 such universality is reduced for monomials that additionally involve the martingale, to that of monomials only in \({\mathbf {J}}\) and \({\mathbf {X}}\).
More precisely, the statistics we consider throughout this section are of the following form. Fix any s (not necessarily distinct) pairs \({\varvec{\alpha }}=( \alpha _1,\ldots ,\alpha _s)\) where each \(\alpha _k = i_kj_k\), and r-tuple (not necessarily distinct) \(\varvec{\gamma }= (\gamma _1,\ldots ,\gamma _r)\) where each \(\gamma _i \in \{1,\ldots ,N\}\). Then consider observables \(f_{{\varvec{\alpha }},{\varvec{\gamma }}}({{\mathbf {x}}})\) of the form
For an s-tuple of pairs \({\varvec{\alpha }}\), let
-
\(I_{{\varvec{\alpha }}}\) count the number of distinct pairs in \({\varvec{\alpha }}\), i.e., \(I_{\varvec{\alpha }}= |\{\alpha _1,\ldots , \alpha _s\}|\),
-
\(I_{{\varvec{\alpha }},1}\) count the number of \((\alpha _k)_k\) which appear exactly once in \({\varvec{\alpha }}\), and
-
\(I^+_{{\varvec{\alpha }},1}\) equal \(I_{{\varvec{\alpha }},1}\) plus the indicator that no pair appears more than twice in \({\varvec{\alpha }}\).
Our bound on the distance between the expectations of \(f_{{\varvec{\alpha }},{\varvec{\gamma }}}({\mathbf {X}}_t)\) and \(f_{{\varvec{\alpha }},{\varvec{\gamma }}}({\tilde{{\mathbf {X}}}}_t)\) depends on \({\varvec{\alpha }}\), \({\varvec{\gamma }}\) and the laws \(\mu \), \({\mathbb {P}}_{{\mathbf {A}}}\), \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) only through \({{\mathbf {C}}}_\star \), \(C_\mu \), s, r and \(I^+_{{\varvec{\alpha }},1}\). More precisely, we derive here the following.
Proposition 2.1
There exists \(C= C(r,s,T, {\mathbf {C}}_\star ,C_\mu (r))\) such that for every \(T,r,s \ge 0\), every s-tuple of pairs \({\varvec{\alpha }}\) and every r-tuple \({\varvec{\gamma }}\), if \({\mathbb {P}}_{{\mathbf {A}}}\), \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\) and \(\mu \) satisfy Hypothesis 1, then
Observe that in the case \(s=0\), the right-hand side is \(C N^{-1/2}\).
Remark 2.2
The above theorem shows that having more distinct J’s in the observable, decreases the difference in expectations by more than \(N^{-s/2}\) as would be expected from the typical size of \(J_{ij}\). This should be expected due to CLT-type cancellations: one way to motivate this scaling is by recalling averaged statistics which have J in them, in the context of the spherical SK model, e.g., the most relevant being
(Notice that these statistics are not rescaled by the number of order-one sized monomials; but they remain on the O(1) scale due to additional cancellations from \((J_{ij})\)). This gain in the scaling has to be visible at the level of the difference in expectations under \({\mathbb {P}}\) and \({\tilde{{\mathbb {P}}}}\) in order to hope for universality for such statistics.
Recall from Sect. 1.3 that our high level strategy is to reduce the expectations of statistics of the solution \({\mathbf {X}}_t\) of the sds to combinatorial calculations in terms of mixed moments of \({\mathbf {J}}\) and \({\mathbf {X}}_0\). This is possible by writing \({\mathbb {E}}_{{\mathbf {B}}}[f({\mathbf {X}}_t)]\) as \(P_t f ({\mathbf {X}}_0)\) and then Taylor expanding \(P_t = e^{tL}\) where L is the generator for the process \({\mathbf {X}}_t\) as defined in (1.13). In order for this expansion to be valid, and therefore our approach to be permissible, we need the Taylor expansion for \(e^{tL}\) to converge absolutely, for each fixed N. In the next sub-section, we show that indeed with \(\mu , {\mathbb {P}}_{\mathbf {A}}, {\mathbb {P}}_{\tilde{{\mathbf {A}}}}\) satisfying Hypothesis 1, for each fixed N, the infinite series corresponding to \(P_t f\) converges absolutely, so we can follow this plan.
Before proceeding further, we make the following notational remark.
Notational comment on set and sequence differences For sets \(\{b_1,\ldots ,b_m\} \subset \{a_1,\ldots ,a_n\}\), we let \(\{a_1,\ldots ,a_n\}\setminus \{b_1,\ldots ,b_m\}\) denote the set difference as usual. Frequently we deal with tuples, or sequences in which the order does not matter. For two such tuples \((a_1,\ldots ,a_n)\) and \((b_1,\ldots ,b_m)\) (where of course there may be repetitions in each sequence), we denote by \((a_1,\ldots ,a_n)\setminus (b_1,\ldots ,b_m)\) the difference wherein for each \(b_i\) appearing in \(\{a_1,\ldots ,a_n\}\) we only remove one of its appearances—say the first one—from \((a_1,\ldots ,a_n)\). We also define \((a_1,\ldots ,a_n) \amalg (b_1,\ldots ,b_m)\) to be the concatenation given by \((a_1,\ldots ,a_n,b_1,\ldots ,b_m)\).
2.2 Switching the expectation and the infinite series
The goal of this sub-section is to prove the following absolute convergence result.
Proposition 2.3
Suppose \({\mathbb {P}}_{\mathbf {A}}\) and \(\mu \) satisfy Hypothesis 1. Then, there exists finite \(N_o=N_o(r,T,{\mathbf {C}}_\star )\) such that for every \(N \ge N_o\), every \(T<\infty \), every s-tuple of pairs \({\varvec{\alpha }}\), and every r-tuple of indices \({\varvec{\gamma }}\), we have
As a consequence of Proposition 2.3 and Fubini–Tonelli, we may use the following expansion.
Corollary 2.4
Suppose \({\mathbb {P}}_{{\mathbf {A}}}\), \({\mathbb {P}}_{{\tilde{{\mathbf {A}}}}}\), \(\mu \) satisfy Hypothesis 1. Setting L and \({{\tilde{L}}}\) for their generators, we have that
for every \(N \ge N_o(r,T,{{\mathbf {C}}}_\star )\), every \(t <\infty \), and every s-tuple of pairs \({\varvec{\alpha }}\) and r-tuple of indices \({\varvec{\gamma }}\).
Proceeding hereafter to prove Proposition 2.3, we fix \(r,s,{\varvec{\alpha }}\) and \({\varvec{\gamma }}\), and set \(f=f_{{\varvec{\alpha }},{\varvec{\gamma }}}\). Aiming for upper bounds on \({\mathbb {E}}[|L^{k}f({\mathbf {X}}_0)|]\) which are summable against \(T^{k}/k!\), we first utilize (1.13) to expand \(L^{k}\) as a sum over the \(4^{k}\) words W in the letters \(\{L_{{\mathbf {J}}},L_{{\varvec{\Lambda }}},L_{{\mathbf {h}}},L_{\Delta }\}\) and thereby get the bound
where for every \({{\mathbf {x}}}\in {\mathbb {R}}^N\), \(Wf({{\mathbf {x}}})\) should be understood as \((W_k \cdots W_2 W_1 f)({{\mathbf {x}}})\). For every word \(W\in \{L_{{\mathbf {J}}},L_{{\varvec{\Lambda }}},L_{{\mathbf {h}}},L_{\Delta }\}^{k}\), let \(k_{{\mathbf {J}}}=k_{\mathbf {J}}(W)\) denote the number of \(L_{{\mathbf {J}}}\)’s that appear in W, and similarly define \(k_{{\varvec{\Lambda }}}\), \(k_{{\mathbf {h}}}\), and \(k_{\Delta }\), so that \(k_{{\mathbf {J}}}+k_{{\varvec{\Lambda }}}+k_{{\mathbf {h}}}+k_{\Delta }=k\) and the following structural decomposition of Wf holds.
Claim
For any word \(W\in \{L_{{\mathbf {J}}},L_{{\varvec{\Lambda }}},L_{{\mathbf {h}}},L_{\Delta }\}^{k}\) with \(k_{{\mathbf {J}}},k_{{\varvec{\Lambda }}},k_{{\mathbf {h}}},k_{\Delta }\) occurrences of the corresponding symbols, Wf can be expressed as a sum of (not necessarily distinct) monomials of the form
\({\varvec{\beta }},{\varvec{\beta }}',{\varvec{\zeta }}\) denote the collection of pairs \((\beta _\ell )_{\ell \le k_ {\mathbf {J}}}\), \((\beta '_{\ell })_{\ell \le k_{\varvec{\Lambda }}}\), \((\zeta _\ell )_{\ell \le 2 k_\Delta }\), while \({\varvec{\zeta }}',{\varvec{\xi }}\) denote the sequences \((\zeta '_\ell )_{\ell \le k_{\mathbf {h}}}\), \((\xi _\ell )_{\ell \le r}\) and hereupon we adopt the convention \(x_0 \equiv 1\), allowing for \(\xi _\ell = 0\) as well as \(\zeta _\ell \in (0j)_j\).
In view of Hypothesis 1 on \({\mathbb {P}}_{\mathbf {A}}\) we have that for every N, \(\ell \ge 0\), and index pair \(\alpha \),
Thus, if \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}\) distinct index pairs appear at multiplicities \((n_{\ell }+1)_{\ell \le I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}}\) in the sequence \({\varvec{\alpha }}\amalg {\varvec{\beta }}\) of length \(k_{\mathbf {J}}+s\), then by the independence of \((J_\alpha )_\alpha \),
Consequently, with \({\mathbf {X}}_0\) independent of \({\mathbf {J}}\) we have in view of the assumed bounds on \((\Lambda _{ij})_{i,j}\) \((\sigma _{ij})_{i,j}\) and \((h_{i})_{i}\), that for any term of the form (2.3) with \(I_{{\varvec{\zeta }}}\) entries such that \(\zeta _\ell \not \in (0j)_j\),
using in the last inequality also (1.6) from Hypothesis 1 on \(\mu \), and the definition of \({\mathbf {C}}_\star \).
Our next result is a first step in controlling the number of monomial terms that can appear in the expansion of each word \(W\in \{L_{{\mathbf {J}}},L_{{\varvec{\Lambda }}},L_{{\mathbf {h}}},L_{\Delta }\}^{k}\).
Lemma 2.5
For every \(k_{\mathbf {J}}, k_{\varvec{\Lambda }}, k_{\mathbf {h}}, k_\Delta \) and every \({\varvec{\beta }},{\varvec{\beta }}',{\varvec{\zeta }}',{\varvec{\zeta }},{\varvec{\xi }}\), if we let \(\phi =\phi _{{\varvec{\beta }},{\varvec{\beta }}',{\varvec{\zeta }}',{\varvec{\zeta }},{\varvec{\xi }}}\) be as in (2.3), then \(L_{\mathbf {h}}\phi \), \(L_{\mathbf {J}}\phi \), \(L_{{\varvec{\Lambda }}} \phi \) and \(L_{\Delta } \phi \) can each be expressed as a sum of at most r, rN, \(r {{\mathcal {N}}}_{{\varvec{\Lambda }}}\) and \(r {{\mathcal {N}}}_{{\varvec{\sigma }}}^2\) many such monomials, respectively, each of the same form (with possibly different \({\varvec{\beta }},{\varvec{\beta }}',{\varvec{\zeta }}',{\varvec{\zeta }},{\varvec{\xi }}\)) as (2.3), with the respective \(k_{{\mathbf {J}}},k_{{\varvec{\Lambda }}},k_{{\mathbf {h}}}\) or \(k_{\Delta }\) increased by one.
Proof
Fixing \(k_{{\mathbf {J}}},k_{{\varvec{\Lambda }}},k_{{\mathbf {h}}},k_{\Delta }\) which sum up to k, we proceed by separately considering the effect each of \(L_{\mathbf {h}}\phi \), \(L_{{\mathbf {J}}}\phi \), \(L_{{\varvec{\Lambda }}}\phi \) and \(L_{\Delta }\phi \) has on the monomial \(\phi \). First,
with non-zero contribution only from \(j \in {\varvec{\xi }}\), yielding at most r non-zero terms. To each of these corresponds a monomial of the form of (2.3), for \(k_{{\mathbf {h}}}\mapsto k_{{\mathbf {h}}+1}\), \({\varvec{\zeta }}' \mapsto {\varvec{\zeta }}' \amalg (j)\) and \({\varvec{\xi }}\mapsto ({\varvec{\xi }}\setminus (j))\amalg (0)\). Next,
with non-zero contribution only when \(j\in {\varvec{\xi }}\). With \(i \le N\) the total number of resulting non-zero monomials is now at most rN, each having the stated form with \(k_{{\mathbf {J}}}\mapsto k_{{\mathbf {J}}}+1\), \({\varvec{\beta }}\mapsto {\varvec{\beta }}\amalg (ij)\) and \({\varvec{\xi }}\mapsto ({\varvec{\xi }}\setminus (j)) \amalg (i)\). Likewise, we have that
with non-zero contributions only for \(j\in {\varvec{\xi }}\). Enumerating over \(i \le N\), gives now at most \(r{{\mathcal {N}}}_{\varvec{\Lambda }}\) non-zero monomials, of the stated form, with \(k_{{\varvec{\Lambda }}}\mapsto k_{{\varvec{\Lambda }}}+1\), \({\varvec{\beta }}'\mapsto {\varvec{\beta }}'\amalg (ij)\) and \({\varvec{\xi }}\mapsto ({\varvec{\xi }}\setminus (j)) \amalg (i)\). Finally,
is non-zero only for the summands in which \(j\in {\varvec{\xi }}\). Enumerating over \(0 \le i,i' \le N\) (recalling the convention that \(x_0\equiv 1\)), gives at most \(r{{\mathcal {N}}}_{\varvec{\sigma }}^2\) non-zero monomials, of the stated form, with \(k_{\Delta }\mapsto k_{\Delta }+1\), \({\varvec{\zeta }}\mapsto {\varvec{\zeta }}\amalg (ij) \amalg (i'j)\) and \({\varvec{\xi }}\mapsto ({\varvec{\xi }}\setminus (j,j)) \amalg (i,i')\). \(\square \)
Fixing N, k, an s-tuple of pairs \({\varvec{\alpha }}\), an r-tuple of indices \({\varvec{\gamma }}\) and \(W\in \{L_{\mathbf {J}}, L_{\varvec{\Lambda }}, L_{\mathbf {h}}, L_\Delta \}^{k}\), upon inductively applying Lemma 2.5, we are able to express Wf as the sum of at most
many non-zero monomials of the form of (2.3). Recall that for a monomial \(\phi \), we use \(I_{\varvec{\zeta }}\) for the number of \(\zeta _{\ell }\notin (0j)_{j}\), \(I_{\varvec{\alpha }}\) for the number of distinct pairs in \({\varvec{\alpha }}\), \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}\) for the number of distinct pairs in \({\varvec{\alpha }}\amalg {\varvec{\beta }}\), and introduce \(I_\star = I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}} - I_{\varvec{\alpha }}\), which counts the number of distinct pairs in \(\{{\varvec{\beta }}\} \setminus \{{\varvec{\alpha }}\}\). A careful examination of the proof of Lemma 2.5, yields the following significant refinement upon the crude bound of (2.9).
Proposition 2.6
Fix N, \(r,s,k \ge 0\), an s-tuple of pairs \({\varvec{\alpha }}\), an r-tuple of indices \({\varvec{\gamma }}\), and a word \(W\in \{L_{\mathbf {J}}, L_{\varvec{\Lambda }}, L_{\mathbf {h}}, L_\Delta \}^{k}\). Then, of the monomials in such expansion of Wf, at most
have \(I_{{\varvec{\zeta }}}\) elements of \({\varvec{\zeta }}\) with \(\zeta _\ell \not \in (0j)_j\), and the \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}=I_{\varvec{\alpha }}+I_\star \) distinct pairs in \({\varvec{\alpha }}\amalg {\varvec{\beta }}\) appear in multiplicities \(\{n_\ell + {\mathbf {1}}_{\{\ell > I_{\varvec{\alpha }}\}}\}_{\ell \le I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}}\) within the sequence \({\varvec{\beta }}\) of length \(k_{\mathbf {J}}\). (N.b. we ordered the \((n_\ell )\) with multiplicities in \({\varvec{\beta }}\) of the distinct pairs of \({\varvec{\alpha }}\) appearing first, and the multiplicities in \({\varvec{\beta }}\) of the remaining \(I_\star \) distinct pairs next.)
Proof
The first improvement in (2.10) over (2.9) is from observing that the growth factor \({{\mathcal {N}}}_{{\varvec{\sigma }}}\) applies only in those \(I_{\varvec{\zeta }}\) of the \(2 k_\Delta \) applications of \(L_\Delta \) within W which have led to an element \(\zeta _\ell \not \in (0j)_j\) (see (2.8)), and that there are at most \({2 k_\Delta \atopwithdelims ()I_{\varvec{\zeta }}}\) ways to choose which \(I_{\varvec{\zeta }}\) elements of \({\varvec{\zeta }}\) are not from the 0-th row of \({\varvec{\sigma }}\).
Similarly, the growth factor N in counting the number of monomials after applying \(L_{\mathbf {J}}\) is only relevant during the \(I_\star \) applications of \(L_{\mathbf {J}}\) within W in which a new pair (ij) is selected (see (2.6)). The left-most term in (2.10) counts the number of ways to select the locations of these \(I_\star \) new elements within the \(k_{\mathbf {J}}\) long sequence \({\varvec{\beta }}\), and thereafter to partition the remaining \(k_{\mathbf {J}}-I_\star \) consistently with having the prescribed \(n_\ell \ge 0\) repeats for each of the \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}\) distinct pairs in question. Putting all this together yields the stated bound (2.10) on the number of relevant monomials in the expansion of Wf. \(\square \)
Proof
Combining Proposition 2.6 with the bound (2.4) we deduce that for any word W of length k and any \({\varvec{\alpha }}\) whose \(I_{\varvec{\alpha }}\) distinct terms appear in multiplicities \((c_\ell )_{\ell \le I_{\varvec{\alpha }}}\),
where the inner sum is over all partitions of \(k_{\mathbf {J}}-I_\star \) into \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}\) indistinguishable integers \(n_\ell \ge 0\). Since \( \sum _\ell c_\ell = s\) and \(n_\ell +c_\ell \le k_{\mathbf {J}}+s\) for all \(\ell \), the right-most product is at most \((k_{\mathbf {J}}+s)^s\). Further, the number of \((n_\ell )_\ell \) considered here is at most the number of integer partitions of \(k_{\mathbf {J}}\), which grows slower than \(e^{k_{\mathbf {J}}}\) (c.f. the Hardy-Ramanujan asymptotic partition formula [24]). Thus, we find that for \(C(r,s,C_\mu ,{{\mathbf {C}}}_\star )\) finite and any word W of length k,
Since \(k! \ge k_{\mathbf {J}}! (k-k_{\mathbf {J}})!\), the bounds (2.12) and (2.2) will yield the stated absolute convergence of the infinite series. Specifically, fixing \(T<\infty \) and setting \(\delta =1/(16 T r e {{\mathbf {C}}}_\star )\), we have that
which is finite for any fixed \(N > \delta ^{-2}\), thereby concluding the proof. \(\square \)
2.3 Controlling the differences of the k’th order Taylor coefficients
By Corollary 2.4, we have that
where the last sum is over \(\phi \) appearing in the monomial decomposition of \(Wf({{\mathbf {x}}})\) per Claim 2.2. To bound the differences of expectations on the rhs of (2.14), we next control the type of monomials \(\phi \) of the form (2.3) in the expansion of Wf, for which we may possibly have \({\mathbb {E}}[\phi ({\mathbf {X}}_0)] \ne {\tilde{{\mathbb {E}}}}[\phi ({\mathbf {X}}_0)]\).
Lemma 2.7
For any \(k,s \ge 0\), every s-tuple of pairs \({\varvec{\alpha }}\), and every \(W\in \{L_{\mathbf {J}}, L_{\varvec{\Lambda }}, L_{\mathbf {h}}, L_\Delta \}^k\), the monomials \(\phi \) in the expansion of Wf in Claim rm 2.2 may have \({\mathbb {E}}[\phi ({\mathbf {X}}_0)] \ne {\tilde{{\mathbb {E}}}}[\phi ({\mathbf {X}}_0)]\) only if
where as before, \(I_\star = I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}} - I_{\varvec{\alpha }}\) denotes the number of distinct elements in \(\{{\varvec{\beta }}\}\setminus \{\varvec{\alpha }\}\).
Proof
By the independence of \({\mathbf {J}},{\tilde{{\mathbf {J}}}}\) and \(\mu \), if \({\mathbb {E}}[\phi ({\mathbf {X}}_0)] \ne {\tilde{{\mathbb {E}}}}[\phi ({\mathbf {X}}_0)]\) for some \(\phi = \phi _{{\varvec{\beta }},{\varvec{\beta }}',{\varvec{\zeta }},{\varvec{\zeta }}',{\varvec{\xi }}}\) as in (2.3), then
which for independent, zero-mean \((J_{ij})_{ij}\) of matching variances \(\frac{1}{N} {\mathbf {m}}= \frac{1}{N} {\tilde{{\mathbf {m}}}}\), requires that simultaneously:
The condition (2.16) implies that each of the \(I_\star \) distinct elements in \(\{{\varvec{\beta }}\} \setminus \{{\varvec{\alpha }}\}\) must appear at least twice in \(\{{\varvec{\beta }}\}\), to which end we need at least \(2 I_\star \) applications of \(L_{\mathbf {J}}\) to select those elements. In addition, some other \(I_{{\varvec{\alpha }},1}\) of the \(k_{\mathbf {J}}\) applications of \(L_{\mathbf {J}}\) must align exactly with the pairs \((\alpha _{ij})\) appearing only once in \({\varvec{\alpha }}\), so necessarily \(k_{\mathbf {J}}\ge 2 I_\star + I_{{\varvec{\alpha }},1}\). Further, the condition (2.17) requires \(k_{\mathbf {J}}+s \ge 3\) and when no pair appears more than twice in \({\varvec{\alpha }}\), an extra application of \(L_{\mathbf {J}}\) beyond the preceding \(2 I_\star + I_{{\varvec{\alpha }},1}\) is needed for producing the third appearance of some \(\alpha _\star \), as stated in (2.15). \(\square \)
We are now able to prove that the expectations of monomials of the form \(f_{{\varvec{\alpha }},{\varvec{\gamma }}}({\mathbf {X}}_{t})\) are universal.
Proof
Fixing \({\varvec{\alpha }},{\varvec{\gamma }}\), in view of Lemma 2.7, it suffices when bounding the rhs of (2.14), to consider only words W and monomials \(\phi \) for which (2.15) holds. By restricting attention only to monomials for which (2.15), holds, we find as in (2.11), that for any \({\varvec{\alpha }}\) whose \(I_{\varvec{\alpha }}\) distinct terms appear in multiplicities \((c_\ell )_{\ell \le I_{\varvec{\alpha }}}\), and every word W of length k such that \(k_{\mathbf {J}}+ s\ge 3\),
where as in (2.11), the inner sum runs over all partitions of \(k_{\mathbf {J}}-I_\star \) into \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}\) indistinguishable integers \(n_\ell \ge 0\). Reasoning as we did leading up to (2.12), we find that
Plugging (2.18) into (2.14), as in the derivation of (2.13), we get for \(\delta = 1/(16 T r e {{\mathbf {C}}}_\star )\) and \(N \ge \rho := (2/\delta )^2\),
where \({\bar{C}}=2 C e^{-1/\delta } \rho ^{I^+_{{\varvec{\alpha }},1}/2}\). This completes the proof, as both series on the rhs of (2.19) are finite and independent of N. \(\square \)
Remark 2.8
In the case of symmetric random matrices \({\mathbf {A}}\), \({\tilde{{\mathbf {A}}}}\) (where only the upper triangular and diagonal elements are independent), we identify index pairs \(\beta = ij\) and \({{\hat{\beta }}}=ji\) as being the same. We do so whenever considering \(I_{\varvec{\alpha }}\), \(I_{{\varvec{\alpha }},1}\), \(I^+_{{\varvec{\alpha }},1}\), \(I_{{\varvec{\alpha }}\amalg {\varvec{\beta }}}\), \(I_\star \), and the multiplicities \((n_\ell )_\ell \), as well as in the restrictions (2.16)–(2.17) imposed on the multiplicities within \({\varvec{\alpha }}\amalg {\varvec{\beta }}\). Once this is done, the only difference in our proof is to replace in (2.10) the weight \(r^k\) by \((2r)^k\).
3 The extension to multi-time polynomial observables
In this section, we extend the results of Sect. 2 to more general observables, namely those that contain coefficients that depend on the driving martingale, and those that depend on the trajectory through multiple times, rather than just one. We then use those extensions to prove Theorem 1. To this end, fix any l, any \(({\varvec{\alpha }}^{(1)},\ldots ,{\varvec{\alpha }}^{(l)})\) each consisting of \(s_i\) pairs, any \(({\varvec{\gamma }}^{(1)},\ldots ,{\varvec{\gamma }}^{(l)})\) each consisting of \(r_i\) indices, and also fix m indices \({\varvec{\xi }}=( \xi _1,\ldots ,\xi _m )\). Fix l times \(0\le t_1\le \cdots \le t_l\le T\) and m times \(0\le u_1 \le \cdots \le u_m\le T\). For \(f_{{\varvec{\alpha }}^{(i)},{\varvec{\gamma }}^{(i)}}\) as in (2.1), consider observables of the form,
Let \({\bar{r}} = \sum _i r_i +m\) and \({\bar{{\varvec{\alpha }}}}\) denote the concatenation \({\varvec{\alpha }}^{(1)} \amalg \cdots \amalg {\varvec{\alpha }}^{(l)}\) of length \({\bar{s}} := \sum _i s_i\).
Proposition 3.1
There exist finite \(C({\bar{r}}, {\bar{s}}, m,l, T, {\mathbf {C}}_\star ,C_\mu ({\bar{r}}))\) such that for every l, m, every \(({\varvec{\alpha }}^{(i)})_{i\le l}\), \(({\varvec{\gamma }}^{(i)})_{i \le l}\), \({\varvec{\xi }}\), every \({\mathbf {t}}\in [0,T]^l\), \({\mathbf {u}}\in [0,T]^m\) and \(g({\mathbf {t}},{\mathbf {u}})= g_{({\varvec{\alpha }}^{(i)}),({\varvec{\gamma }}^{(i)}),{\varvec{\xi }}} ({\mathbf {t}}, {\mathbf {u}})\) as in (3.1),
We proceed to prove Proposition 3.1, which we thereafter combine with a short combinatorial estimate bounding the number of terms with specific values of \(I^+_{{\bar{{\varvec{\alpha }}}},1}\) to establish Theorem 1.
3.1 Proof of Proposition 3.1
We start with the case of \(m=0\) to which we will reduce the case of \(m>0\).
Lemma 3.2
Proposition 3.1 holds when \(m=0\).
Proof
Fixing l, \(({\varvec{\alpha }}^{(i)})_{i\le l}\) and \(({\varvec{\gamma }}^{(i)})_{i \le l}\), we set here \(f^{(i)}({{\mathbf {x}}})=f_{{\varvec{\alpha }}^{(i)},{\varvec{\gamma }}^{(i)}}({{\mathbf {x}}})\) and
and for any l-tuple of times \({\mathbf {t}}= (t_1,\ldots , t_l)\in [0,T]^l\), evaluate (3.2) on the argument \(({\mathbf {X}}_{t_1},\ldots , {\mathbf {X}}_{t_l})\): i.e., let
We express the expectation \({\mathbb {E}}_{{\mathbf {B}}}\) with respect to the Brownian motion of \(g({\mathbf {t}})\), in terms of the (diffusion) semi-group operator as
Expanding each semi-group operator in terms of powers of the generator L, the above is precisely
Taking the difference in expectations between \({\mathbb {E}}\) and \({\tilde{{\mathbb {E}}}}\), upon justifying swapping the expectation with the infinite sum (as done in Sect. 2.2), and using the fact that
for every \(k_1,k_2,\ldots ,k_l\) such that \(k_1+\cdots +k_l= k\), we obtain that
The following structural property for words appearing in the above will allow us to reduce the analysis of multi-time observables to the combinatorial analysis of one-time observables \(f_{{\bar{{\varvec{\alpha }}}},{\bar{{\varvec{\gamma }}}}}=f^{(1)} f^{(2)} \cdots f^{(l)}\), for \({\bar{{\varvec{\alpha }}}} = {\varvec{\alpha }}^{(1)} \amalg \cdots \amalg {\varvec{\alpha }}^{(l)}\) and \({\bar{{\varvec{\gamma }}}} := {\varvec{\gamma }}^{(1)} \amalg \cdots \amalg {\varvec{\gamma }}^{(l)}\), which we have already completed.\(\square \)
Claim
Fix \(k_1,\ldots ,k_l \ge 0\) such that \(\sum _{i} k_i=k\) and words \(W_i\in \{L_{\mathbf {J}}, L_{\varvec{\Lambda }}, L_{\mathbf {h}},L_\Delta \}^{k_i}\), \(i=1,\ldots ,l\), with \(k_{\mathbf {J}}^{i}, k_{\varvec{\Lambda }}^{i}, k_{\mathbf {h}}^{i}, k_\Delta ^{i}\), of each appearing, respectively. Then, the function
consists of a sum of (not necessarily distinct) monomials of the form
Moreover, each monomial \(\phi ({{\mathbf {x}}})\) appearing in this expansion, must also appear in such monomial expansion of \(W f_{{\bar{{\varvec{\alpha }}}},{\bar{{\varvec{\gamma }}}}}\) for \(W = W_1 \cdots W_l \in \{L_{\mathbf {J}}, L_{\varvec{\Lambda }}, L_{\mathbf {h}},L_\Delta \}^{k}\).
Proof
The structure of the monomials is evident. Every such monomial in \(W_1 f^{(1)} W_2 f^{(2)} \cdots W_l f^{(l)}\) must also appear in the monomial expansion of \([W_1 \cdots W_l] f_{{\bar{{\varvec{\alpha }}}},{\bar{{\varvec{\gamma }}}}}\) because a subset of the terms in the latter are obtained by applying the letters in \(W_l\) to \(f^{(l)}\), then the letters in \(W_{l-1}\) to \(f^{(l-1)} (W_l f^{(l)})\), and so on. Finally, observe that \(W_1 \cdots W_l\) is always a word in \(\{L_{\mathbf {J}}, L_{\varvec{\Lambda }}, L_{\mathbf {h}}, L_\Delta \}^{k}\). \(\square \)
With Claim 3.1 in hand, we further get that
where the sums are over the monomials \(\phi \) in the decomposition of \(W_1 f^{(1)} \cdots W_l f^{(l)}\) and that of \(W f_{{\bar{{\varvec{\alpha }}}},{\bar{{\varvec{\gamma }}}}}\) per Claim 3.1. Note that each summand on the rhs of (3.4) is at most some \((k+1)^l l^k\) times the corresponding summand of (2.14) for the choice \(f=f_{{\bar{{\varvec{\alpha }}}},{\bar{{\varvec{\gamma }}}}}\) for which we have deduced the bound of (2.18). Utilizing the latter and the elementary bound \(k+1 \le (k_{\mathbf {J}}+1)(k+1-k_{\mathbf {J}})\), by proceeding as in the derivation of (2.19), we find that for \(C=C({\bar{r}},{\bar{s}},C_\mu ({\bar{r}}),{{\mathbf {C}}}_\star )\) finite, \(\delta = 1/(16 \, l \, T {{\bar{r}}}\, e \, {{\mathbf {C}}}_\star )\) positive and \(N \ge (2/\delta )^2\),
for some finite \({\bar{C}}={\bar{C}}(l,{\bar{r}},{\bar{s}}, T, {\mathbf {C}}_\star ,C_\mu ({\bar{r}}))\). \(\square \)
We now add in the driving martingale observables (i.e., \(m>0\)) and conclude the proof of Proposition 3.1.
Proof
We reduce the situation \(m>0\) to the combinatorial calculations of Lemma 3.2 by utilizing the following expansion from Ito’s lemma:
When expanding (3.1) in this manner, the terms containing only products of \(X_{\xi _i}(u_i)\) can be absorbed into \({\varvec{\gamma }}\), in which case their difference in expectations has already been handled in Lemma 3.2, so by linearity it suffices for us to focus on handling terms of the form
where \({\varvec{\tau }}= (\tau _1,\ldots ,\tau _m) \in [0,T]^m\) and where, setting \(f^{(i)}({{\mathbf {x}}})=f_{{\varvec{\alpha }}^{(i)},{\varvec{\gamma }}^{(i)}}({{\mathbf {x}}})\),
Thus, fixing l, m, \(({\varvec{\alpha }}^{(i)}),({\varvec{\gamma }}^{(i)}),{\varvec{\xi }}\) and letting \(h({\mathbf {t}}, {\mathbf {u}}) = h_{({\varvec{\alpha }}^{(i)}),({\varvec{\gamma }}^{(i)}),{\varvec{\xi }}}({\mathbf {t}},{\mathbf {u}}) \) we obtain after swapping the expectation and integrals that
which thereby yields the following bound on the relevant difference in expectations
Proceeding hereafter wlog to bound the difference in expectations for \({\widehat{h}}({\mathbf {t}},{\varvec{\tau }})\), we suppose for ease of exposition that \(0 \le t_l = \tau _0 \le \tau _1\le \cdots \le \tau _m\) (the situation where the two groups intertwine is similarly analyzed with the obvious modifications). As done in the proof of Lemma 3.2, first expressing \({\mathbb {E}}_{{\mathbf {B}}}\) in terms of the semi-group operator and then expanding that in powers of the generator L we find that
At this point, proceeding as in the derivation of (3.4), up to the transformations
we first use (3.3) to get the bound
with the sum running over monomial decomposition of \((W_1 f^{(1)} \cdots W_l f^{(l)} W_1' x_{\xi _1} \cdots W'_m x_{\xi _m}) ({{\mathbf {x}}})\). Then, utilizing again Claim 3.1, as well as the bound \(k! \ge {\bar{k}}!/({\bar{k}})^m\), we arrive at
where as before \({\bar{{\varvec{\alpha }}}} = {\varvec{\alpha }}^{(1)}\amalg \cdots \amalg {\varvec{\alpha }}^{(l)}\) is of length \({\bar{s}} = \sum _i s_i\), while \({\bar{{\varvec{\gamma }}}}\) of length \({\bar{r}} = \sum r_i +m\) has now the additional elements \((x_{\xi _i})_{i\le m}\). Up to this update of \({\bar{r}}\) and the immaterial weight factor \(({\bar{k}}/({\bar{l}} T))^m\) of its summands, the expression on the rhs of (3.5) is the same as that in (3.4). We thus conclude as in the proof of Lemma 3.2 that for some \(C(l,m,{\bar{r}}, {\bar{s}}, T, {\mathbf {C}}_\star ,C_\mu ({\bar{r}}))\) all \({\mathbf {t}}\in [0,T]^l\) and \({\mathbf {u}}\in [0,T]^m\),
\(\square \)
3.2 Proof of Theorem 1.
Fix T, m, p, \(C_{\mathbf {a}}\), \({\mathbf {a}}\in {\mathbb {R}}^{N^{m}}\) such that \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\), and \({\mathbf {t}}\in [0,T]^p\). For every \(\ell \le m\), fix observables \({\mathcal {Y}}^{(\ell ,1)},\ldots ,{\mathcal {Y}}^{(\ell ,p)}\in {{\mathfrak {F}}}\) and let \(F({\mathbf {t}})\) be as in (1.5) with those choices. By linearity of expectations and the uniform bound on \(\Vert {\mathbf {a}}\Vert _\infty \), it suffices to show that uniformly over \(i_1,\ldots ,i_m\),
We denote by \({\bar{s}}\) the number of \({\mathcal {Y}}\) terms appearing in the preceding product which is a coordinate of \({\mathbf {G}}_t\). In case \({\bar{s}}=0\), the bound (3.6) follows from considering Proposition 3.1 at \({\bar{s}} =0\), in which case \(I^+_{{\bar{{\varvec{\alpha }}}},1}=1\). Otherwise, we expand every term in that product which is a coordinate of \({\mathbf {G}}_t\) to obtain a sum of monomials of the form of (3.1). Each of these monomials has a sequence \({\bar{{\varvec{\alpha }}}}\) of length \({\bar{s}}\), and as a result of such expansion there are at most \({\bar{s}}^{{\bar{s}}} N^{I_{{\bar{{\varvec{\alpha }}}}}}\) monomials with precisely \(I_{{\bar{{\varvec{\alpha }}}}}\) distinct pairs in the sequence \({\bar{{\varvec{\alpha }}}}\). Note that for any \({\bar{{\varvec{\alpha }}}}\),
Indeed, each pair which appears once in \({\bar{{\varvec{\alpha }}}}\), is counted both in \({\bar{s}}\) and in \(I_{{\bar{{\varvec{\alpha }}}},1}\), all other pairs are counted at least twice in \({\bar{s}}\), and for any \({\bar{{\varvec{\alpha }}}}\) of maximal multiplicity two, we have added one to \(I^+_{{\bar{{\varvec{\alpha }}}},1}\). Consequently, the bound of Proposition 3.1 on the difference in expectation for each of these \({\bar{s}}^{{\bar{s}}} N^{I_{{\bar{{\varvec{\alpha }}}}}}\) many monomials is at most \(C N^{-I_{{\bar{{\varvec{\alpha }}}}}-1/2}\) for some constant \(C(T, m,p, {\mathbf {C}}_{\star }, C_\mu )\). From this, the bound (3.6) immediately follows upon enumerating over the at most \({\bar{s}}\) many choices for \(I_{{\bar{{\varvec{\alpha }}}}}\). \(\square \)
4 Concentration for quadratic observables: Proof of Theorem 2
Assuming henceforth that \({\mathbf {M}}_t\) is a scaled Brownian motion (i.e., that \(\sigma _{ij}\) are identically zero for \(i\ne 0\)), our goal is to prove Theorem 2 about the uniform over \({\mathbf {t}}\in [0,T]^2\) concentration property of the quadratic observable of (1.10),
(for uniformly bounded non-random \({\mathbf {a}}= (a_i)_{i}\) and \({\mathcal {Y}}, {\mathcal {Y}}'\) in the collection \({{\mathfrak {F}}} = \{{\mathbf {1}}_t, {\mathbf {X}}_t, {\mathbf {G}}_t, {\mathbf {M}}_t\}\) of (1.4)). To this end, we introduce in Sect. 4.1 high probability localizing sets \({\mathcal {L}}_{N,R}\) on which various norms of \({\mathbf {X}}_t\) (and our observables \(F({\mathbf {t}})\)), are uniformly bounded. Sect. 4.2 shows that on \({\mathcal {L}}_{N,R}\), such \(F({\mathbf {t}})\) are \(O(N^{-1/2})\)-Lipschitz in a mixed \(\ell ^2\)-norm. Combining these facts we prove Theorem 2 in Sect. 4.3.
4.1 Localizing the process
Denote the 2-to-2 matrix norm by
and for each constant R consider the following localization subset of \({{\mathcal {E}}}_N := {\mathbb {R}}^N \times {\mathbb {R}}^{N^2} \times C([0,T],{\mathbb {R}}^N)\),
We begin by bounding the probability that \(({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}})\notin {\mathcal {L}}_{N,R}\).
Lemma 4.1
There exists \(C= C(T, C_\mu , C_{\mathbf {A}},C_{\varvec{\sigma }})>0\) and \(R_0(T,C_\mu ,C_{\mathbf {A}},C_{\varvec{\sigma }})<\infty \), such that for every \(R \ge R_0\) if \(\mu ,{\mathbb {P}}_{{\mathbf {A}}}\) satisfy Hypotheses 1–2, then
Proof
We bound \({\mathcal {L}}_{N,R}^c\) by the union of the events where each of the three norms is greater than \(\sqrt{RN/3}\). First, since \({\mathbf {M}}_t\) is a Brownian motion (scaled by \((\sigma _{0j})_j\)), by Doob’s maximal inequality for the sub-martingale \(\exp (\delta \Vert {\mathbf {M}}_t\Vert ^2)\), we have for some \(C(C_{\varvec{\sigma }})>0\) any \(R \ge T R_0(C_{\varvec{\sigma }})\) and all N,
Next, since \(\mu \) satisfies Hypotheses 1–2, the independent \(X_i(0)\) have uniform (in i and N), second moments and exponential tails. Hence, applying [30, Theorem 3] for the centered sum of i.i.d. variables that stochastically dominate \(X_i^2(0)\), we have for some \(C(C_\mu )>0\), any \(R \ge R_0(C_\mu )\) and all N,
It thus remains only to show that when \({\mathbb {P}}_{{\mathbf {A}}}\) satisfies Hypothesis 2, we have for some \(C(C_{\mathbf {A}})>0\) any \(R \ge R_0(C_{\mathbf {A}})\) and all N,
To this end, recall [28, Theorem 2] that there exists a universal constant C such that for any matrix \({\mathbf {A}}\) with independent, zero-mean entries of second moments \(m_{ij}\) and fourth moments \(b_{ij}\),
For \({\mathbb {P}}_{{\mathbf {A}}}\) satisfying Hypothesis 1, \(b_{ij}\) and \(m_{ij}\) are bounded uniformly in i, j and N (see (1.8)). Hence, in the case where \({\mathbf {A}}\) is composed of independent entries, for some \(C(C_{\mathbf {A}})\) finite and all N,
Likewise, representing a symmetric \({\mathbf {A}}\) as \({\mathbf {A}}= {\mathbf {A}}^+ + {\mathbf {A}}^-\), with \({\mathbf {A}}^+\) the upper triangle (including the diagonal) part of \({\mathbf {A}}\) and \({\mathbf {A}}^-\) its lower triangle part, [28, Theorem 2] holds for the matrices \({\mathbf {A}}^-\) and \({\mathbf {A}}^+\) of zero-mean, independent entries (with uniformly bounded forth moments). Thus, (4.4) holds also in this case up to a factor of 2. Thanks to (4.4), if \(\sqrt{R} \ge 4 C\) then
Recall that \(\Vert {\mathbf {A}}\Vert _{2\rightarrow 2}\), which is the largest singular value of \({\mathbf {A}}\), is 1-Lipschitz in its entries (endowed with the Euclidean norm, on \({\mathbf {A}}^+\) when \({\mathbf {A}}\) assumed symmetric). Indeed, this follows by combining the triangle inequality \(|\Vert {\mathbf {A}}\Vert _{2\rightarrow 2} - \Vert {\mathbf {B}}\Vert _{2\rightarrow 2}|\le \Vert {\mathbf {A}}- {\mathbf {B}}\Vert _{2\rightarrow 2}\) with the domination of the operator norm by the Frobenius norm, \(\Vert {\mathbf {A}}- {\mathbf {B}}\Vert _{2\rightarrow 2} \le \Vert {\mathbf {A}}- {\mathbf {B}}\Vert _{F}\). Hypothesis 2 for \({\mathbb {P}}_{{\mathbf {A}}}\) thus yields the bound (4.3). \(\square \)
We further have on the sets \({\mathcal {L}}_{N,R}\) the following localization for both \(({\mathbf {X}}_t)_{t\in [0,T]}\) and \(({\mathbf {G}}_t)_{t \in [0,T]}\).
Proposition 4.2
There exists \(R_0(T,{\mathbf {C}}_\star )\) and \(C_0({\mathbf {C}}_\star )\) such that if \(R \ge R_0\), and \(({\mathbf {X}}_0, {\mathbf {J}},{\mathbf {M}})\in {\mathcal {L}}_{N,R}\), then
In addition, for every \({\mathbf {a}}\) such that \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\) (uniformly over N) and every \({\mathcal {Y}}, {\mathcal {Y}}' \in {{\mathfrak {F}}}\), if \(F({\mathbf {t}})\) is as in (1.10), we have for all \(k \ge 1\),
Proof
Setting \(e_N(t) = \frac{1}{\sqrt{N}}\Vert {\mathbf {X}}_t\Vert \), we get upon expanding (1.2), that
From the definition of the 2-to-2 norm, evidently
Hence, by Cauchy–Schwarz,
where in the last inequality we rely on our assumption that \(\Vert {\varvec{\Lambda }}\Vert _{1\rightarrow 1} \le C_{\varvec{\Lambda }}\) and \(\Vert {\varvec{\Lambda }}\Vert _{\infty \rightarrow \infty } \le C_{\varvec{\Lambda }}\), to deduce that \(\Vert {\varvec{\Lambda }}\Vert _{2 \rightarrow 2} \le C_{\varvec{\Lambda }}\). Combining these bounds on \((I_i)_{i\le 5}\), and dividing out by \(e_N(t)\), we see that
By Gronwall’s inequality, using the localization to \({\mathcal {L}}_{N,R}\), it then follows that for any \(t \in [0,T]\),
yielding the lhs of (4.5) as soon as \(R \ge R_0(T,{{\mathbf {C}}}_\star ) \ge 1\). From the lhs of (4.8) we know that \(\Vert {\mathbf {G}}_t\Vert \le \sqrt{R} \, \Vert {\mathbf {X}}_t\Vert \) throughout \({\mathcal {L}}_{N,R}\), hence after suitably increasing \(C_0\) and \(R_0\), the rhs of (4.5) holds as well.
To deduce the uniformly bounded moment estimate of (4.6) for \({\mathbf {X}}_t\), recall first from the lhs of (4.5) that
Combining the latter bound with that of Lemma 4.1, we arrive at
The rhs decreases in N and as \(f'(R)=(C_0 T k)/(2 \sqrt{R}) f(R)\), it is finite for \(\sqrt{N}/C > C_0 T k\), yielding the lhs of (4.6). The rhs of (4.6) follows by applying the same reasoning to \(Z_{N,{\mathbf {G}}}^k =\big ( N^{-1/2} \sup _{t \in [0,T]} \Vert {\mathbf {G}}_t\Vert \big )^k\) while utilizing the rhs of (4.5).
Turning to (4.7), note that for any \(k \ge 1\) and \(F({\mathbf {t}})\) of (1.10) with \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\), by Cauchy–Schwarz,
Thus, yet another application of Cauchy–Schwarz results with
If \({\mathcal {Y}}\) is \({\mathbf {1}}\), this latter expectation is simply 1. If \({\mathcal {Y}}\) is \({\mathbf {M}}\), using the tail bound of (4.2) in combination with (4.9) (now for \(f(R) = (R/3)^k\)), the latter expectation is uniformly bounded in N. Lastly if \({\mathcal {Y}}\) is from \(\{{\mathbf {X}},{\mathbf {G}}\}\), the expectation above is uniformly bounded in N by (4.6). Combining these yields the desired (4.7). \(\square \)
4.2 A Lipschitz estimate on quadratic observables
Our next proposition shows that on \({\mathcal {L}}_{N,R}\) all \(F({\mathbf {t}})\) of the form (1.10) are \(O(N^{-1/2})\)-Lipschitz in the \(({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\) endowed with the following mixed 2-norm on \({{\mathcal {E}}}_N\),
(which is taken from [3, Hypothesis 1.1]).
Proposition 4.3
Fixing \({\mathbf {a}}\) such that \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\) and \({\mathcal {Y}}, {\mathcal {Y}}'\in {{\mathfrak {F}}}\), denote by \(F({\mathbf {t}}; ( {\mathbf {X}}_0, {\mathbf {J}},{\mathbf {M}}))\) the observable in (1.10) evaluated on the trajectory \({\mathbf {X}}_t\) constructed out of the triplet \(({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\). There exist \(R_0(T,C_{\mathbf {a}}, {\mathbf {C}}_\star )\) and \(C(T, C_{\mathbf {a}}, {\mathbf {C}}_{\star })\) such that for any \(R\ge R_0\) all N and \(({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}}), ({\mathbf {X}}'_0,{\mathbf {J}}',{\mathbf {M}}')\) in \({\mathcal {L}}_{N,R}\)
The key to Proposition 4.3 is to show that \({\mathbf {X}}_t\) is O(1)-Lipschitz on \({\mathcal {L}}_{N,R}\) endowed with \(\Vert \cdot \Vert _{\textsc {mix}}\). Specifically, denoting by \({\mathbf {X}}_t({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\) the solution to (1.2), constructed from the triplet \(({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\) and \({\mathbf {X}}_t'({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\) the solution constructed from the triplet \(({\mathbf {X}}_0', {\mathbf {J}}', {\mathbf {M}}')\), our next lemma establishes a uniform over \({\mathcal {L}}_{N,R}\) Lipschitz bound on \(\Vert {\mathbf {X}}_t-{\mathbf {X}}'_t\Vert \).
Lemma 4.4
There exist \(R_0(T, {\mathbf {C}}_\star ),C(T,{\mathbf {C}}_\star )\) such that for all \(R\ge R_0\) and \(({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}}),( {\mathbf {X}}'_0 , {{\mathbf {J}}}', {\mathbf {M}}') \in {\mathcal {L}}_{N,R}\),
Proof
Following the strategy of proof of [3, Lemma 2.6], we let
and expanding over \(j \le N\), we have by the definition of the solution \({\mathbf {X}}_t\) for the sds (1.2)–(1.3), that
where \({\mathbf {G}}'(\cdot )\) is defined as \({\mathbf {G}}(\cdot )\) but constructed using \({\mathbf {J}}'\) instead of \({\mathbf {J}}\). By Cauchy–Schwarz,
Recalling (4.8), we similarly find that
Turning to the terms involving \({\mathbf {G}}(\cdot )\) or \({\mathbf {G}}'(\cdot )\), observe first that
Using the localization to \({\mathcal {L}}_{N,R}\), we thus find that
where in the last inequality we further assumed \(R \ge R_0(T,{{\mathbf {C}}}_\star )\), utilizing the lhs of (4.5). Further increasing \(R_0\) such that \(T e^{C_0 \sqrt{R}_0 T} \ge 1\), upon combining the bounds on \((I_i)_{i\le 5}\), and dividing out by \(e_N(t)\), we see that
Recall that \(\Vert {\mathbf {J}}\Vert ^2_{2 \rightarrow 2} \le \sum _{ij} J_{ij}^2\), so by Gronwall’s inequality, there exist \(C(T,{\mathbf {C}}_\star )\), such that
for any \(R\ge R_0\), every N and all \(t\in [0,T]\), as claimed. \(\square \)
Proof
Fix \({\mathcal {Y}}^1, {\mathcal {Y}}^2 \in {{\mathfrak {F}}}\), \({\mathbf {a}}\) such that \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\) and \({\mathbf {t}}= (t_1,t_2)\in [0,T]^2\). Equipped with Lemma 4.4 and (4.11) it remains to establish a Lipschitz control on differences of \(F({\mathbf {t}}; ({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}}))\) in terms of differences of \(\Vert {\mathbf {G}}_t\Vert \), \(\Vert {\mathbf {X}}_t\Vert \) and \(\Vert {\mathbf {M}}_t\Vert \) corresponding to any pair of triplets \(({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}})\) and \(({\mathbf {X}}'_0,{\mathbf {J}}', {\mathbf {M}}')\) in \({\mathcal {L}}_{N,R}\). To this end, we start with the following bound on differences of \(F({\mathbf {t}}; \cdot )\):
Since the two terms on the RHS can be bounded symmetrically, wlog we focus on the first one, which by Cauchy–Schwarz, is at most
where as before, \({\mathbf {X}}_t'\) is constructed out of the triplet \(({\mathbf {X}}'_0 , {\mathbf {J}}', {\mathbf {M}}')\). Now recall from \(({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\in {\mathcal {L}}_{N,R}\) and Proposition 4.2, that the right-most term in (4.12) is at most \(\exp (C_0 \sqrt{R} T)\) for all \(R \ge R_0\), in which case by the preceding
Recall Lemma 4.4 and (4.11), to deduce that for some \(C(T,{\mathbf {C}}_\star )>0\), every \(R\ge R_0\), and all \(({\mathbf {X}}_0, {\mathbf {J}}, {\mathbf {M}})\), we have \(({\mathbf {X}}'_0, {\mathbf {J}}', {\mathbf {M}}')\in {\mathcal {L}}_{N,R}\),
Putting these all together, we deduce that there exists some other \(R_0 (T, {\mathbf {C}}_\star )\) and \(C(T,C_{\mathbf {a}},{\mathbf {C}}_\star )\), such that for all \(R\ge R_0(T, {\mathbf {C}}_{\star })\),
\(\square \)
We conclude this subsection by combining the respective exponential concentrations of Lipschitz functions due to \(\mu \), \({\mathbb {P}}_{\mathbf {A}}\) and \({\mathbb {P}}_{\mathbf {B}}\).
Lemma 4.5
Suppose that \(\mu , {\mathbb {P}}_{{\mathbf {A}}}\) satisfy Hypothesis 2. Then \({\mathbb {P}}=\mu \otimes {\mathbb {P}}_{\mathbf {A}}\otimes {\mathbb {P}}_{\mathbf {B}}\) satisfies exponential concentration of Lipschitz functions with respect to \(({{\mathcal {E}}}_N, \Vert \cdot \Vert _{\textsc {mix}})\).
Proof
Fix any function f that is 1-Lipschitz on \(({{\mathcal {E}}}_N, \Vert \cdot \Vert _{\textsc {mix}})\). Let us expand
where the subscripts of the expectations indicate which random variables the expectation is taken over. Call the above three differences \(I_{{\mathbf {M}}}, I_{{\mathbf {J}}}\) and \(I_{{\mathbf {X}}_0}\) say. For every \({\mathbf {X}}_0,{\mathbf {J}}\) fixed, \( f({{\mathbf {X}}_0, {\mathbf {J}},{\mathbf {M}}})\) is 1-Lipschitz in \({\mathbf {M}}\in C([0,T], {\mathbb {R}}^N)\) endowed with the norm \(\sup _{t\le T} \Vert \cdot \Vert \). As such, from the exponential concentration of Lipschitz functions satisfied by \({\mathbb {P}}_{{\mathbf {B}}}\) with respect to \(C([0,T],{\mathbb {R}}^N)\) endowed with \(\sup _{t\le T} \Vert \cdot \Vert \) (see e.g., the discussion around [3, Hypothesis 1.1]), there exists \(C= C(C_{\varvec{\sigma }})>0\) such that for every \(r>0\),
Similarly, we have that for every fixed \({\mathbf {X}}_0\), \({\mathbb {E}}_{\mathbf {B}}[f({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}})]\) is 1-Lipschitz in \({\mathbf {J}}\) endowed with its rescaled Frobenius norm \(\sum _{i,j} (\sqrt{N} J_{ij})^2\), and finally, \({\mathbb {E}}_{{\mathbf {J}},{\mathbf {B}}} [ f({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}})]\) is 1-Lipschitz in \({\mathbf {X}}_0\) endowed with its \(\ell ^2\) norm. Altogether, expanding
we see that the exponential concentrations for 1-Lipschitz functions of \(\mu , {\mathbb {P}}_{\mathbf {A}}\) and \({\mathbb {P}}_{{\mathbf {B}}}\) lift to exponential concentration of \({\mathbb {P}}\) for functions that are 1-Lipschitz in the triplet \(({\mathbf {X}}_0, {\mathbf {J}},{\mathbf {M}})\) on \(({{\mathcal {E}}}_N, \Vert \cdot \Vert _{\textsc {mix}})\). \(\square \)
4.3 Proof of Theorem 2
We first prove a concentration estimate for F at a fixed pair of times \({\mathbf {t}}\in [0,T]^2\), before extending this to the full trajectory \((F({\mathbf {t}}))_{{\mathbf {t}}\in [0,T]^2}\) by bounding the modulus of continuity of F.
Proposition 4.6
Suppose \(\mu \), \({\mathbb {P}}_{\mathbf {A}}\) satisfy Hypotheses 1–2. There exist \(C(T,C_{\mathbf {a}}, {\mathbf {C}}_\star , C_\mu )\) large, such that for any F as in (1.10) with \(\Vert {\mathbf {a}}\Vert _\infty \le C_{\mathbf {a}}\), \({\mathcal {Y}}, {\mathcal {Y}}'\in {{\mathfrak {F}}}\), all \({\mathbf {t}}\in [0,T]^2\), \(\lambda >0\) and \(N \ge N_0(T,C_{\mathbf {a}},{\mathbf {C}}_\star ,C_\mu )\),
Proof
In proving [3, Lemma 2.5] it is shown, using a Lipschitz extension, that if \({\mathbb {P}}\) satisfies exponential concentration for Lipschitz functions as in (1.11) and V is an A-Lipschitz function on a set \({\mathcal {L}}\) on which |V| is uniformly bounded by K, then for some universal constant \(C>0\) and every \(\lambda >0\),
Recall from Lemma 4.5 that \({\mathbb {P}}= \mu \otimes {\mathbb {P}}_{\mathbf {A}}\otimes {\mathbb {P}}_{\mathbf {B}}\) satisfies exponential concentration for Lipschitz functions in \(({{\mathcal {E}}}_N, \Vert \cdot \Vert _{\textsc {mix}})\) and Proposition 4.3 that \(V=F({\mathbf {t}};\cdot )\) is \(\frac{D(R)}{\sqrt{N}}\)-Lipschitz on \({\mathcal {L}}={\mathcal {L}}_{N,R}\) for \(D(R)=C_1 e^{C_1 \sqrt{R}}\), for some \(C_1 (T, C_{\mathbf {a}}, {\mathbf {C}}_\star )\) for every \(R \ge R_0(T,C_{\mathbf {a}},{\mathbf {C}}_\star )\), all N, and every F, \({\mathbf {t}}\) as in Theorem 2.
Further, increasing \(R_0\) as needed for Lemma 4.1 and Proposition 4.2, yields
as well as guaranteeing that \(C_2^2 := \sup _{N,{\mathbf {t}}} \{{\mathbb {E}}[F({\mathbf {t}})^2]\}\) is finite and that \({\mathbb {P}}({\mathcal {L}}_{N,R}^c) \le \exp (-\sqrt{RN}/C_3)\) for some \(C_3(T, C_\mu , C_{{\mathbf {A}}},C_{{\varvec{\sigma }}})\). Plugging all this into (4.15) gives us a family of upper bounds for \(R \ge R_0\),
For \(R=R_0\) we can embed the constant factor \(2 D(R_0)\) into C and further adjust \(C_3\) to bound the pre-exponent \(2 (C_2+K(R_0))\) within the factor \(\exp (-\sqrt{R_0 N}/(2C_3))\) multiplying it, resulting with \(q_N(\lambda ;R_0)\) as in the top line on the rhs of (4.14). For a better tail decay, consider \(R_\lambda =(\eta \log \lambda )^2 \ge R_0\), with \(\eta =1/(2C_1)\) so \(D(R_\lambda )= C_1 e^{C_1 \eta \log \lambda } \le C_1 \lambda /\log \lambda \) for all \(\lambda \ge 4\). In addition, once \(\sqrt{N}/(2C_3) \ge 4 C_0 T\) we can again embed the pre-exponent \(2(C_2+K(R_\lambda ))/\lambda \) within the factor \(\exp (-\sqrt{R_\lambda N}/(2 C_3))\) multiplying it . Thus, upon adjusting the various constants we end up with \(q_N(\lambda ;R_\lambda )\) as in the bottom line on the rhs of (4.14). \(\square \)
Setting hereafter R for the larger of \(R_0\) and \(R_\lambda \) values from the preceding proof of Proposition 4.6, recall that the event \({\mathcal {L}}^c_{N,R}\) was already ruled out as part of the derivation of (4.14). Thus, proceeding to prove Theorem 2, we fix \(\varepsilon = N^{-k}\), \(k>1\), and apply Proposition 4.6 at the \(M_N = \lceil T N^k \rceil ^2\) grid points \({\mathbf {t}}_{i,j}=(i \varepsilon , j \varepsilon )\) within \([0,T]^2\), to deduce by the union bound that
It is easy to check that \(2 M_N q_N(\lambda )\) is further bounded by \(p_N(3 \lambda )\) of (1.12) once we suitably enlarge the constant C on the rhs of (1.12) relative to that of (4.14). In addition, since the right-most term in (4.15) exceeds one whenever \({\mathbb {E}}[|V| \mathbf{1}_{{\mathcal {L}}_{N,R}^c}] = {\mathbb {E}}[ |F({\mathbf {t}})| \mathbf{1}_{{\mathcal {L}}_{N,R}^c}] \ge \lambda /2\), if that inequality holds for any \({\mathbf {t}}\in [0,T]^2\), then \(q_N(\lambda )\) and in turn \(p_N(3\lambda )\) of (1.12) would exceed one. Thus, we may assume wlog that
We can then expand
Restricting to \(\lambda > 1/\sqrt{N}\) (as otherwise \(p_N(3 \lambda ) \ge 1\)), and using \(p_N(3 \lambda ) \gg M_N \exp (- (\lambda ^2 \wedge \lambda ) N^k/C')\) (as \(k>1\)) with the above, the stated bound of Theorem 2, follows from the following short-time estimates.
Lemma 4.7
There exists \(C'(C_{\varvec{\sigma }})\), such that for every \(\varepsilon \le 1\), \(\lambda \ge C' \varepsilon \), and F as in Theorem 2,
In particular, for any \(N \ge N_0(T,C_{\mathbf {a}},C_\mu , {\mathbf {C}}_\star )\) and \(\lambda \ge N^{-1/2} = \varepsilon ^{1/(2k)}\), \(k>1\),
Proof
Similarly to the computation leading to (4.13), we find that for any \({\mathbf {t}}+{\mathbf {s}}\in [0,T]^2\) and F as in Theorem 2, evaluated on the solution \({\mathbf {X}}_t({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}})\) that corresponds to some \(({\mathbf {X}}_0,{\mathbf {J}},{\mathbf {M}}) \in {\mathcal {L}}_{N,R}\)
When \({\mathcal {Y}}=\mathbf{1}\) this difference is zero, whereas in case \({\mathcal {Y}}= {\mathbf {X}}\) and \(s_i \le \varepsilon \), assuming wlog that \(R_0, {\mathbf {C}}_\star \ge 1\), we have on \({\mathcal {L}}_{N,R}\), by (4.5) and the rhs of (4.8), that
Further, similarly to the lhs of (4.11), on \({\mathcal {L}}_{N,R}\),
so up to extra factor \(\sqrt{R}\) the bound (4.19) applies for \({\mathcal {Y}}={\mathbf {G}}\), and considering all cases we get for \({\mathbf {s}}\in [0,\varepsilon ]^2\),
For some \(C'>0\), when \(R=R_0\) and \(\lambda \ge C' \varepsilon \), the right most term in (4.20) can not exceed \(\lambda /2\). The same applies for \(R=R_\lambda = (\eta \log \lambda )^2\) provided \(\eta \le 1/(3C_0 T)\). By the same reasoning, for such \(\eta \) and some \(C_4(T, C_{\mathbf {a}}, R_0)>0\), the factor multiplying \(\Vert {\mathbf {M}}_{t_i + s_i} - {\mathbf {M}}_{t_i} \Vert \) in (4.20), is in both cases at most \((\sqrt{\lambda } \vee 1)/(2 C_4 \sqrt{N})\). Recall from (4.2) and the stationarity of Brownian increments, that there exists \(C(C_{\varvec{\sigma }})\) such that for every \(L \ge \varepsilon ^2 L_0(C_{\varvec{\sigma }})\), every N,
Combining (4.20) and (4.21), we thus get that for some \(C'(C_{\varvec{\sigma }})\), for every \(\lambda \ge C' \varepsilon \), and every N, \({\mathbf {t}}=(t_1,t_2)\),
as claimed in (4.17). Next, by Cauchy-Schwarz, (4.7) and (4.17), there exists \(C(T, C_{\mathbf {a}}, C_\mu , {\mathbf {C}}_\star )\) such that for every \(N \ge N_0(T, C_{\mathbf {a}}, C_\mu , {\mathbf {C}}_\star )\), every \(\lambda \ge 2 C' \varepsilon \), every \({\mathbf {t}}\), \({\mathbf {s}}\) and all F,
Our assumption that \(\lambda \ge \varepsilon ^{1/(2k)}\) for some \(k>1\) guarantees that the right most term is at most \(\lambda /2\) (as soon as \(N \ge N_0\)), thereby establishing (4.18). \(\square \)
References
Bayati, M., Lelarge, M., Montanari, A.: Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25(2), 753–822 (2015). https://doi.org/10.1214/14-AAP1010
Ben Arous, G., Dembo, A., Guionnet, A.: Aging of spherical spin glasses. Probab. Theory Related Fields 120(1), 1–67 (2001). https://doi.org/10.1007/PL00008774
Ben Arous, G., Dembo, A., Guionnet, A.: Cugliandolo-Kurchan equations for dynamics of spin-glasses. Probab. Theory Related Fields 136(4), 619–660 (2006). https://doi.org/10.1007/s00440-005-0491-y
Ben Arous, G., Gheissari, R., Jagannath, A.: Algorithmic thresholds for tensor PCA. Ann. Probab. 48(4), 2052–2087 (2020). https://doi.org/10.1214/19-AOP1415
Ben Arous, G., Gheissari, R., Jagannath, A.: Bounding flows for spherical spin glass dynamics. Commun. Math. Phys. 373(3), 1011–1048 (2020). https://doi.org/10.1007/s00220-019-03649-4
Ben Arous, G., Guionnet, A.: Large deviations for Langevin spin glass dynamics. Probab. Theory Related Fields 102(4), 455–509 (1995). https://doi.org/10.1007/BF01198846
Ben Arous, G., Guionnet, A.: Symmetric Langevin spin glass dynamics. Ann. Probab. 25(3), 1367–1422 (1997). https://doi.org/10.1214/aop/1024404517
Chatterjee, S.: A simple invariance theorem Available at arXiv:math/0508213 (2005)
Chatterjee, S.: A generalization of the Lindeberg principle. Ann. Probab. 34(6), 2061–2076 (2006). https://doi.org/10.1214/009117906000000575
Chen, W.K., Lam, W.K.: Universality of approximate message passing algorithms. arXiv:2003.10431 (2020)
Crisanti, A., Horner, H., Sommers, H.J.: The spherical p-spin interaction spin-glass model. Zeitschrift für Physik B Condensed Matter 92(2), 257–271 (1993). https://doi.org/10.1007/BF01312184
Crisanti, A., Sommers, H.J.: The spherical \(p\)-spin interaction spin glass model: the statics. Zeitschrift für Physik B Condensed Matter 87(3), 341–354 (1992). https://doi.org/10.1007/BF01309287
Crisanti, A., Sompolinsky, H.: Dynamics of spin systems with randomly asymmetric bonds: Langevin dynamics and a spherical model. Phys. Rev. A 36, 4922–4939 (1987). https://doi.org/10.1103/PhysRevA.36.4922
Cugliandolo, L.F., Dean, D.S.: Full dynamical solution for a spherical spin-glass model. J. Phys. A: Math. Gen. 28(15), 4213–4234 (1995). https://doi.org/10.1088/0305-4470/28/15/003
Cugliandolo, L.F., Kurchan, J.: Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model. Phys. Rev. Lett. 71, 173–176 (1993). https://doi.org/10.1103/PhysRevLett.71.173
Dembo, A., Guionnet, A., Mazza, C.: Limiting dynamics for spherical models of spin glasses at high temperature. J. Stat. Phys. 128(4), 847–881 (2007). https://doi.org/10.1007/s10955-006-9239-z
Dembo, A., Lubetzky, E., Zeitouni, O.: Universality for Langevin-like spin glass dynamics. Ann. Appl. Probab. (2021)
Dembo, A., Subag, E.: Dynamics for spherical spin glasses: disorder dependent initial conditions. J. Stat. Phys. 181(2), 465–514 (2020). https://doi.org/10.1007/s10955-020-02587-z
Erdős, L., Schlein, B., Yau, H.T.: Universality of random matrices and local relaxation flow. Inventiones mathematicae 185(1), 75–119 (2011). https://doi.org/10.1007/s00222-010-0302-7
Erdős, L., Yau, H.: A Dynamical Approach to Random Matrix Theory. Courant Lecture Notes. Courant Institute of Mathematical Sciences, New York University (2017). https://books.google.com/books?id=6kUzDwAAQBAJ
Gromov, M., Milman, V.D.: A topological application of the isoperimetric inequality. Am. J. Math. 105(4), 843–854 (1983)
Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002). https://doi.org/10.1007/s00220-002-0699-y
Guionnet, A.: Dynamics for spherical models of spin-glass and aging. In: Spin glasses, Lecture Notes in Math., vol. 1900, pp. 117–144. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-40908-3_5
Hardy, G.H., Ramanujan, S.: Asymptotic formulaæ in combinatory analysis. Proc. Lond. Math. Soc. s2–17(1), 75–115 (1918). https://doi.org/10.1112/plms/s2-17.1.75
Hertz, J.A., Grinstein, G., Solla, S.A.: Irreversible spin glasses and neural networks. In: van Hemmen, J.L., Morgenstern, I. (eds.) Heidelberg Colloquium on Glassy Dynamics, pp. 538–546. Springer, Berlin (1987)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8), 2554–2558 (1982). https://doi.org/10.1073/pnas.79.8.2554
Kinzel, W.: Neural networks with asymmetric bonds. In: van Hemmen, J.L., Morgenstern, I. (eds.) Heidelberg Colloquium on Glassy Dynamics, pp. 529–537. Springer, Berlin (1987)
Latala, R.: Some estimates of norms of random matrices. Proc. Am. Math. Soc. 133(5), 1273–1282 (2005)
Mossel, E., O’Donnell, R., Oleszkiewicz, K.: Noise stability of functions with low influences: Invariance and optimality. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’05, p. 21–30. IEEE Computer Society, USA (2005). https://doi.org/10.1109/SFCS.2005.53
Nagaev, A.V.: Integral limit theorems taking large deviations into account when Cramer’s condition does not hold. i. Theory Probab. Appl. 14(1), 51–64 (1969). https://doi.org/10.1137/1114006
Øksendal, B.: Stochastic Differential Equations: An Introduction with Applications. Hochschultext / Universitext. Springer (2003)
Panchenko, D.: The Parisi ultrametricity conjecture. Ann. Math. 177(1), 383–393 (2013). https://doi.org/10.4007/annals.2013.177.1.8
Panchenko, D.: The Sherrington-Kirkpatrick Model. Springer, Berlin (2013)
Rotar’, V.I.: Limit theorems for polylinear forms. J. Multivar. Anal. 9(4), 511–530 (1979)
Talagrand, M.: Gaussian averages, Bernoulli averages, and Gibbs’ measures. Random Struct. Algorithms 21(3), 197–204 (2002). https://doi.org/10.1002/rsa.10059
Talagrand, M.: The Parisi formula. Ann. Math. 163(1), 221–263 (2006)
Tao, T., Vu, V.: Random matrices: Universality of local eigenvalue statistics. Acta Math. 206(1), 127–204 (2011). https://doi.org/10.1007/s11511-011-0061-3
Tao, T., Vu, V.: Random matrices: The Universality phenomenon for Wigner ensembles Available at arXiv:1202.0068 (2012). https://cds.cern.ch/record/1421029
Vershynin, R.: High–Dimensional Probability. Cambridge University Press (to appear) (2018)
Wigner, E.P.: On the distribution of the roots of certain symmetric matrices. Ann. Math. 67(2), 325–327 (1958)
Xu, Z.B., Hu, G.Q., Kwong, C.P.: Asymmetric Hopfield-type networks: theory and applications. Neural Netw. 9(3), 483–501 (1996). https://doi.org/10.1016/0893-6080(95)00114-X
Acknowledgements
The authors thank the anonymous referee for useful comments, and Ramon van Handel and Ofer Zeitouni for helpful conversations. This project was supported in part by NSF Grants #DMS-1613091, #DMS-1954337 (A.D.), and by the Miller institute for basic research in science (R.G.).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dembo, A., Gheissari, R. Diffusions interacting through a random matrix: universality via stochastic Taylor expansion. Probab. Theory Relat. Fields 180, 1057–1097 (2021). https://doi.org/10.1007/s00440-021-01027-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-021-01027-7
Keywords
- Stochastic differential equations
- Universality
- Markov semi-group
- Random matrices, Disordered systems
- Langevin dynamics
- Gradient flows