1 Introduction and main results

Let \({{\mathcal {A}}}\) be the adjacency matrix of the Erdős–Rényi graph \({{\mathcal {G}}}(N,p)\). Explicitly, \({{\mathcal {A}}} = ({\mathcal {A}}_{ij})_{i,j = 1}^N\) is a symmetric \(N\times N\) matrix with independent upper triangular entries \(({\mathcal {A}}_{ij} :i \leqslant j)\) satisfying

$$\begin{aligned} {{\mathcal {A}}}_{ij}={\left\{ \begin{array}{ll} 1 &{}\quad {\text {with probability }} p \\ 0 &{}\quad {\text {with probability }} 1-p. \end{array}\right. } \end{aligned}$$

We introduce the normalized adjacency matrix

$$\begin{aligned} A:=\sqrt{\frac{1}{p(1-p)N}}\,{\mathcal {A}}, \end{aligned}$$
(1.1)

where the normalization is chosen so that the eigenvalues of A are typically of order one.

The goal of this paper is to obtain the asymptotic distribution of the extreme eigenvalues of A. The extreme eigenvalues of graphs are of fundamental importance in spectral graph theory and have attracted much attention in the past 30 years; see for instance [1, 4, 17] for reviews. The Erdős–Rényi graph is the simplest model of a random graph and its adjacency matrix is the canonical example of a sparse random matrix.

Each row and column of A has typically Np nonzero entries, and hence A is sparse whenever \(p \rightarrow 0\) as \(N \rightarrow \infty \). In the complementary dense regime, where p is of order one, A is a Wigner matrix (up to a centring of the entries). The edge statistics of Wigner matrices have been fully understood in [8, 10, 23, 26,27,28], where it was shown that the distribution of the largest eigenvalue is asymptotically given by the GOE Tracy–Widom distribution [29, 30].

To discuss the edge statistics of A in the sparse regime, we introduce the following conventions. Unless stated otherwise, all quantities depend on the fundamental parameter N, and we omit this dependence from our notation. We write \(X \ll Y\) to mean \(X = O_\varepsilon (N^{-\varepsilon } Y)\) for some fixed \(\varepsilon > 0\). We write \(X \asymp Y\) to mean \(X = O(Y)\) and \(Y = O(X)\). We denote the eigenvalues of A by \(\lambda _1 \leqslant \cdots \leqslant \lambda _N\). The largest eigenvalue \(\lambda _N\) of A is its Perron–Frobenius eigenvalue. For \(Np \gg 1\), it is typically of order \(\sqrt{Np}\), while the other eigenvalues \(\lambda _1, \lambda _2, \ldots , \lambda _{N-1}\) are typically of order one.

The edge statistics of sparse matrices were first studied in [8, 9], where it was proved that when \(Np \gg N^{2/3}\) the second largest eigenvalue of A exhibits GOE Tracy–Widom fluctuations, i.e.

$$\begin{aligned} \lim _{N\rightarrow \infty } {\mathbb {P}}\big (N^{2/3}(\lambda _{N-1}-{\mathbb {E}} \lambda _{N-1})\leqslant s\big )=F_1(s), \end{aligned}$$

where \(F_1(s)\) is the distribution function of the GOE Tracy–Widom distribution. In [24], this result was extended to \(Np \gg N^{1/3}\), which it turns out is optimal. Indeed, in [19] it was shown that when \(N^{2/9}\ll Np \ll N^{1/3}\) the Tracy–Widom distribution for \(\lambda _{N - 1}\) no longer holds, and the extreme eigenvalues have asymptotically Gaussian fluctuations. More precisely, in [19] it was shown that if \(N^{2/9}\ll Np \ll N^{1/3}\) then

$$\begin{aligned} \sqrt{\frac{N^2 p}{2}}(\lambda _{N-1}-{\mathbb {E}} \lambda _{N-1}) \overset{\mathrm {d}}{\longrightarrow }{\mathcal {N}}(0,1). \end{aligned}$$
(1.2)

In this paper we show (1.2) for the whole range \(1 \ll Np \ll N^{1/3}\). In fact, we show this for a general class of sparse random matrices introduced in [8, 9]. It is easy to check that the normalized adjacency matrix A (1.1) of \({\mathcal {G}}(N,p)\) satisfies the following definition with the choice

$$\begin{aligned} q:=\sqrt{Np} . \end{aligned}$$
(1.3)

Definition 1.1

(Sparse matrix). Let \(1 \leqslant q \leqslant \sqrt{N}\). A sparse matrix is a real symmetric \(N\times N\) matrix \(H=H^* \in {\mathbb {R}}^{N \times N}\) whose entries \(H_{ij}\) satisfy the following conditions.

  1. (i)

    The upper-triangular entries (\(H_{ij}:1 \leqslant i \leqslant j\leqslant N\)) are independent.

  2. (ii)

    We have \({\mathbb {E}} H_{ij}=0\), \( {\mathbb {E}} H_{ij}^2=(1+O(\delta _{ij}))/N\), and \({\mathbb {E}} H_{ij}^4\asymp 1/(Nq^2)\) for all ij.

  3. (iii)

    For any \(k\geqslant 3\), we have \({\mathbb {E}}|H_{ij}|^k \leqslant C_k/ (Nq^{k-2})\) for all ij.

We define the random matrix

$$ A = H + f {{\mathbf {e}}} {{\mathbf {e}}}^*, $$

where \({{\mathbf {e}}} :=N^{-1/2}(1,1,\ldots ,1)^*\), and \(f \geqslant 0\).

For simplicity of presentation, in this paper we focus only on real matrices, although our results and proofs extend to matrices with complex entries with minor modifications which we omit; see also Remark 8.2 below.

To describe the fluctuations of the eigenvalues of A, we define the random variable

$$\begin{aligned} {\mathcal {Z}} :=\frac{1}{N}{{\,\mathrm{Tr}\,}}H^2-1. \end{aligned}$$
(1.4)

Defining

$$\begin{aligned} \Sigma :=\left( \frac{1}{N^2}\sum _{i,j}{\mathbb {E}} H_{ij}^4\right) ^{1/2}, \end{aligned}$$

one easily finds

$$\begin{aligned} \frac{1}{\sqrt{2}\Sigma }{\mathcal {Z}}\overset{\mathrm {d}}{\longrightarrow }{\mathcal {N}}(0,1) \quad \text {and} \quad \Sigma \asymp \frac{1}{\sqrt{N}q} . \end{aligned}$$
(1.5)

We denote by \(\gamma _{\mathrm {sc},i}\) be the ith N-quantile of the semicircle distribution, which is the limiting empirical eigenvalue measure of A for \(Np \gg 1\). Explicitly, \(\int _{-2}^{\gamma _{\mathrm {sc},i}}\frac{1}{2\pi }\sqrt{4-x^2} \, \mathrm {d}x =\frac{i}{N}\,\).

Throughout the following we fix an exponent \(\beta \in (0,1/2]\) and set

$$\begin{aligned} q = N^\beta . \end{aligned}$$
(1.6)

If A is the normalized adjacency matrix (1.1) of \({\mathcal {G}}(N,p)\) then from (1.3) and (1.6) we find that the condition \(1 \ll Np \ll N^{1/3}\) reads \(1 \ll q \ll N^{1/6}\), i.e. \(\beta \in (0,1/6)\). We may now state our main result.

Theorem 1.2

Fix \(\beta \in (0,1/6)\) and set

$$\begin{aligned} \delta \equiv \delta (\beta ):=\frac{1}{10} \min \{\beta ,1/6-\beta \}. \end{aligned}$$
(1.7)

Let H be as in Definition 1.1 with q given by (1.6). Fix \(\varepsilon > 0\) and \(D > 0\). Then for large enough N we have with probability at least \(1 - N^{-D}\)

$$\begin{aligned} \Big |\lambda _i-{\mathbb {E}} \lambda _i-\frac{\gamma _{\mathrm {sc},i}}{2} {\mathcal {Z}} \Big | = O(N^{\varepsilon - \delta } \Sigma ) \end{aligned}$$
(1.8)

for all \(1\leqslant i \leqslant N-1\).

Theorem 1.2 implies, for all \(i\in \{1,2,\ldots ,N-1\}\) such that \(\gamma _{\mathrm {sc},i}\) is away from 0, that the fluctuations of \(\lambda _i\) are simultaneously governed by those of \({\mathcal {Z}}\). In fact, by the rigidity result of [9, Theorem 2.13] and a simple moment estimate of \({\mathcal {Z}}\) [see (2.5) below], we deduce from (1.5) and Theorem 1.2 that under its conditions, with probability at least \(1 - N^{-D}\) we have

$$\begin{aligned} \lambda _i = {\mathbb {E}}\lambda _i \biggl ({1 + \frac{{\mathcal {Z}}}{2}}\biggr ) + O ({N^{- \delta /2} \Sigma }) \end{aligned}$$
(1.9)

for all \(i = 1, \ldots , N - 1\). Thus, for \(1 \ll q \ll N^{1/6}\), the fluctuation of all eigenvalues away from 0 is given by a global random scaling by the factor \(1 + {\mathcal {Z}}/2\).

Remark 1.3

If \(f = 0\) in Definition 1.1, i.e. \(A = H\) is centred, then the conclusion of Theorem 1.2 holds for all eigenvalues \(\lambda _1, \ldots , \lambda _N\). Indeed, if \(f = 0\) then A and \(-A\) both satisfy Definition 1.1, and \(\lambda _N(A) = - \lambda _1(-A)\).

Our main result is a rigidity estimate for the eigenvalues of A with accuracy

$$\begin{aligned} \frac{1}{N^{1/2 + \delta /2} q}. \end{aligned}$$

In contrast, the corresponding rigidity results of [9, 19, 24] have accuracy up to a fixed power of \(q^{-1}\): up to \(q^{-2}\) in [9], \(q^{-4}\) in [24], and \(q^{-6}\) in [19]. For arbitrarily small polynomial values of q, the rigidity provided by an expansion up to a fixed power of \(q^{-1}\) is not sufficient to analyse the fluctuations of the extreme eigenvalues. Thus, the main technical achievement of our paper is the avoidance of \(q^{-1}\)-expansions in the error bounds.

Remark 1.4

The variable \({\mathcal {Z}}\) was introduced in [19], where its importance for the edge fluctuations of sparse random matrices was first recognized. Using it, the authors proved (1.8) for \(\beta \in (1/9,1/6)\).

Remark 1.5

Let A be the rescaled adjacency matrix (1.1) of \({\mathcal {G}}(N,p)\). The fluctuations of the eigenvalues of A have a particularly transparent interpretation in terms of the fluctuation of the average degree of \({\mathcal {G}}(N,p)\), or, equivalently, its total number of edges. To that end, denote by \({\mathcal {D}} :=\frac{1}{N} \sum _{i,j} {\mathcal {A}}_{ij}\) the average degree of \({\mathcal {G}}(N,p)\) and by \(d :={\mathbb {E}}{\mathcal {D}} = Np\) its expectation. Defining the randomly rescaled adjacency matrix

$$\begin{aligned} {\widehat{A}} :=\frac{1}{\sqrt{{\mathcal {D}}}} {\mathcal {A}}, \end{aligned}$$
(1.10)

we claim that under the assumptions of Theorem 1.2 we have

$$\begin{aligned} \lambda _i({\widehat{A}}) = {\mathbb {E}}\lambda _i(A) + O(N^{-\delta /2} \Sigma ) \end{aligned}$$
(1.11)

with probability at least \(1 - N^{-D}\). Indeed, a short calculation yields \({\mathcal {D}} = d \bigl ({1 + (1 - p) {\mathcal {Z}} + O (p)}\bigr )\), from which (1.11) follows using (1.9) and the bounds \(p = O(N^{-\delta } \Sigma )\) and \(|{\mathcal {Z}} |^2 = O(N^{-\delta } \Sigma )\) with probability at least \(1 - N^{-D}\) [by (2.5) below].

In (1.11), the Gaussian fluctuations (1.9) present for \(\lambda _i(A)\) are absent for \(\lambda _i({\widehat{A}})\). Hence, the fluctuations of the eigenvalues of A can be all simultaneously eliminated to leading order by an appropriate random rescaling. Note that we can write \(A = d^{-1/2} {\mathcal {A}} (1 + O(N^{-\delta } \Sigma ))\), in analogy to (1.10). Thus, (1.11) states that if one replaces the deterministic normalization \(d^{-1/2}\) with the random normalization \({\mathcal {D}}^{-1/2}\) the fluctuations vanish to leading order. In fact, although it is not formulated that way, our proof can essentially be regarded as a rigidity result for the matrix \({\widehat{A}}\).

Remark 1.5 is consistent with the fact that for more rigid graph models where the average degree is fixed, \({\mathcal {Z}}\) does not appear: for a random d-regular graph, the second largest eigenvalue of the adjacency matrix has Tracy–Widom fluctuations for \(N^{2/9}\ll d \ll N^{1/3}\) [2]. Moreover, in [19] it was proved that the second largest eigenvalue of \({\widehat{A}}\) has Tracy–Widom fluctuations for \(q \gg N^{1/9}\).

Theorem 1.2 trivially implies the following result.

Corollary 1.6

We adopt the conditions in Theorem 1.2. Fix \(\varepsilon >0\). Define

$$\begin{aligned} X_{i}:=\frac{\lambda _{i}-{\mathbb {E}} \lambda _{i}}{\gamma _{\mathrm {sc},i}\Sigma /\sqrt{2}} \end{aligned}$$

for all \(i \in \{1,2,\ldots , \lfloor {(\frac{1}{2}-\varepsilon ) N} \rfloor , \lfloor {(\frac{1}{2}+\varepsilon ) N} \rfloor ,\ldots ,N-1\}=:{\mathcal {I}}\). We have

$$\begin{aligned} (X_{i_1},\ldots ,X_{i_k}) \overset{\mathrm {d}}{\longrightarrow }{\mathcal {N}}_k({{{\mathbf {0}}}}, {\mathcal {J}}), \end{aligned}$$

for all fixed k and \(i_1,\ldots ,i_k \in {\mathcal {I}}\). Here \({\mathcal {J}}\in {{\mathbb {R}}}^{k\times k}\) is the matrix of ones, i.e. \({\mathcal {J}}_{ij}= 1\) for all \(i,j\in \{1,2,\ldots ,k\}\).

Next, we remark on the fluctuations of single eigenvalues inside the bulk. This problem was first addressed in [11] for GUE, extended to GOE in [25], and recently extended to general Wigner matrices in [3, 21]. In these works, it was proved that the bulk eigenvalues of Wigner matrices fluctuate on the scale \(\sqrt{\log N}/N\). More precisely,

$$\begin{aligned} \frac{\mu _i-\gamma _{\mathrm {sc},i}}{\sqrt{\frac{8\log N}{\big (4-\gamma _{\mathrm {sc},i}^2\big )N^2}}} \overset{\mathrm {d}}{\longrightarrow }{\mathcal {N}}(0,1) \end{aligned}$$

for all bulk eigenvalues \(\mu _i\), \(\varepsilon N \leqslant i \leqslant (1 - \varepsilon ) N\), of a real Wigner matrix. The bulk eigenvalue fluctuation of sparse matrices was studied in [12], where it was shown that for fixed \(\beta \in (0,1/2)\), there exists \(c\equiv c(\beta )>0\) such that with probability at least \(1 - N^{-D}\)

$$\begin{aligned} \Big |\lambda _i-{\mathbb {E}} \lambda _i-\frac{\gamma _{\mathrm {sc},i}}{2}{\mathcal {Z}} \Big | = O(N^{-c} \Sigma ) \end{aligned}$$

for all bulk eigenvalues \(\lambda _i\), \(\varepsilon N \leqslant i \leqslant (1 - \varepsilon ) N\), of A.

In summary, we have the following general picture of fluctuations of eigenvalues for sparse random matrices. The fluctuations of any single eigenvalue consists of two components: a random matrix component and a sparseness component. The random matrix component is independent of the sparseness and coincides with the corresponding fluctuations of GOE. It has order \(N^{-2/3}\) at the edge and order \(\sqrt{\log N} / N\) in the bulk. The sparseness component is captured by the random variable \({\mathcal {Z}}\) and has order \(1/(\sqrt{N} q)\) throughout the spectrum except near the origin. Thus, the sparseness component dominates in the bulk as soon as \(q \ll \sqrt{N}\) and at the edge as soon as \(q \ll N^{1/6}\). In fact, our proof suggests that \({\mathcal {Z}}\) is only the leading order such Gaussian contribution arising from the sparseness, and that there is an infinite hierarchy of strongly correlated and asymptotically Gaussian random variables of which \({\mathcal {Z}}\) is the largest and whose magnitudes decrease in powers of \(q^{-2}\). In order to obtain random matrix Tracy–Widom statistics near the edge, one would have to subtract all of such contributions up order \(N^{-2/3}\). For \(q = N^{\beta }\) with \(\beta \) arbitrarily small, the number of such terms becomes arbitrarily large.

For completeness, we mention that the bulk eigenvalue statistics have also been analysed in terms of their correlation functions and eigenvalue spacings, which have a very different behaviour from the single eigenvalue fluctuations described above. It was proved in [8, 9, 18, 22] that the asymptotics of the local eigenvalue correlation functions in the bulk coincide with those of GOE for any \(q \gg 1\). Thus, the sparseness has no impact on the asymptotic behaviour of the correlation functions and the eigenvalue spacings.

We conclude this section with a few words about the proof. The fluctuations of the extreme eigenvalues are considerably harder to analyse than those of the bulk eigenvalues, and in particular the method of [12] breaks down at the edge because the self-consistent equations on which it relies become unstable. The key difficulty near the edge is to obtain strong rigidity estimates on the locations of the extreme eigenvalues, while no such estimates are needed in the bulk. Indeed, the central step of the proof is Proposition 4.1 below, which provides an upper bound for the fluctuations of the largest eigenvalue of H. This is obtained by showing, for suitable E outside the bulk of the spectrum and \(\eta >0\), that the imaginary part of the Green’s function \(G(E+\mathrm {i}\eta ):=(H-E-\mathrm {i}\eta )^{-1}\) satisfies \( {{\,\mathrm{Im}\,}}{{\,\mathrm{Tr}\,}}G(E+\mathrm {i}\eta ) \ll 1/\eta \). Our basic approach is the self-consistent polynomial method for sparse matrices developed in [19, 24]. Thus, we first obtain a highly precise bound on the self-consistent polynomial P of the Green’s function, which provides a good estimate of \({{\,\mathrm{Tr}\,}}G\) outside the bulk. The key observation in this part is that the cancellation built into P persists also in the derivative of P. Armed with the good estimate of \({{\,\mathrm{Tr}\,}}G\), our second key idea is to estimate the imaginary part of P, which turns out to be much smaller than P itself; from this we deduce strong enough bounds on the imaginary part of G. These two estimates together conclude the proof. We refer to Sect. 3 below for more details of the proof strategy.

The rest of the paper is organized as follows. In Sect. 2 we introduce the notations and previous results that we use in this paper. In Sect. 3 we explain the strategy of the proof. In Sect. 4 we prove Theorem 1.2, assuming key rigidity estimates at the edge (Proposition 4.1) and inside the bulk (Lemma 4.2). In Sect. 5 we give a careful construction of the self-consistent polynomial P of the Green’s function. In Sects. 68, we prove Proposition 4.1, by assuming several improved estimates for large classes of polynomials of Green’s functions. In Sect. 9 we prove Lemma 4.2. Finally in Sect. 10 we prove the estimates that we used in Sects. 68.

2 Preliminaries

In this section we collect notations and tools that will be used. For the rest of this paper we fix \(\beta \in (0,1/6)\) and define \(\delta \) as in (1.7).

Let M be an \(N \times N\) matrix. We denote \(M^{*n}:=(M^{*})^n\), \(M^{*}_{ij}:=(M^{*})_{ij} = {{\overline{M}} \,}_{ji}\), \(M^n_{ij}:=(M_{ij})^n\), and the normalized trace of M by \({{\underline{M}} \,} :=\frac{1}{N} {{\,\mathrm{Tr}\,}}M\). We denote the Green’s function of H by

$$\begin{aligned} G \equiv G(z):=(H-z)^{-1}. \end{aligned}$$
(2.1)

Convention

Throughout the paper, the argument of G and of any Stieltjes transform is always denoted by \(z \in {\mathbb {C}}{\setminus } {\mathbb {R}}\), and we often omit it from our notation.

The Stieltjes transform of the eigenvalue density at z is denoted by \({\underline{G}} \,(z)\). For deterministic z we have the differential rule

$$\begin{aligned} \frac{\partial G_{ij}}{\partial H_{kl}} =-(G_{ik}G_{jl}+G_{il}G_{kj})(1+\delta _{kl})^{-1}. \end{aligned}$$
(2.2)

If h is a real-valued random variable with finite moments of all order, we denote by \({\mathcal {C}}_k(h)\) the kth cumulant of h, i.e.

$$\begin{aligned} {\mathcal {C}}_k(h):=(-\mathrm {i})^k \cdot \big (\partial _{\lambda }^k \log {\mathbb {E}} \mathrm {e}^{\mathrm {i}\lambda h}\big ) \big {|}_{\lambda =0}. \end{aligned}$$

We state the cumulant expansion formula, whose proof is given in e.g. [16, Appendix A].

Lemma 2.1

(Cumulant expansion). Let \(f:{\mathbb {R}}\rightarrow {\mathbb {C}}\) be a smooth function, and denote by \(f^{(k)}\) its kth derivative. Then, for every fixed \(\ell \in {\mathbb {N}}\), we have

$$\begin{aligned} {\mathbb {E}}\big [h\cdot f(h)\big ]=\sum _{k=0}^{\ell }\frac{1}{k!}\mathcal {C}_{k+1}(h){\mathbb {E}}[f^{(k)}(h)]+{\mathcal {R}}_{\ell +1}, \end{aligned}$$
(2.3)

assuming that all expectations in (2.3) exist, where \({\mathcal {R}}_{\ell +1}\) is a remainder term (depending on f and h), such that for any \(t>0\),

$$\begin{aligned} {\mathcal {R}}_{\ell +1}= & {} O(1) \cdot \left( {\mathbb {E}}\sup _{|x| \leqslant |h|} \big |f^{(\ell +1)}(x)\big |^2 \cdot {\mathbb {E}}\,\big | h^{2\ell +4} {\mathbf {1}}_{|h|>t} \big | \right) ^{1/2} \nonumber \\&+O(1) \cdot {\mathbb {E}} |h|^{\ell +2} \cdot \sup _{|x| \leqslant t}\big |f^{(\ell +1)}(x)\big |. \end{aligned}$$
(2.4)

The following result gives bounds on the cumulants of the entries of H, whose proof follows from Definition 1.1 and the homogeneity of the cumulants.

Lemma 2.2

For every \(k \in {\mathbb {N}}\) we have

$$\begin{aligned} {\mathcal {C}}_{k}(H_{ij})=O_{k}\big (1/\big (Nq^{k-2}\big )\big ) \end{aligned}$$

uniformly for all ij.

We use the following convenient notion of high-probability bound from [7].

Definition 2.3

(Stochastic domination). Let

$$X=\bigl ({X^{(N)}(u):N \in {{\mathbb {N}}}, u \in U^{(N)}}\bigr ),\qquad Y=\bigl ({Y^{(N)}(u):N \in {{\mathbb {N}}}, u \in U^{(N)}}\bigr )$$

be two families of random variables, where \(Y^{(N)}(u)\) are nonnegative and \(U^{(N)}\) is a possibly N-dependent parameter set. We say that X is stochastically dominated by Y, uniformly in u, if for all (small) \(\varepsilon >0\) and (large) \(D>0\) we have

$$\begin{aligned} \sup \limits _{u \in U^{(N)}} {{\mathbb {P}}}\left[ \big |X^{(N)}(u)\big | > N^{\varepsilon } Y^{(N)}(u) \right] \leqslant N^{-D} \end{aligned}$$

for large enough \(N \geqslant N_0(\varepsilon ,D)\). If X is stochastically dominated by Y, uniformly in u, we use the notation \(X \prec Y\), or, equivalently \(X=O_{\prec }(Y)\).

Note that for deterministic X and Y, \(X =O_\prec (Y)\) means \(X= O_{\varepsilon }(N^{\varepsilon }Y)\) for any \(\varepsilon > 0\). Sometimes we say that an event \(\Xi \equiv \Xi ^{(N)}\) holds with very high probability if for all \(D > 0\) we have \({\mathbb {P}}(\Xi ) \geqslant 1 - N^{-D}\) for \(N \geqslant N_0(D)\).

By estimating the moments of \({\mathcal {Z}}\) defined in (1.4) and invoking Chebyshev’s inequality, we find

$$\begin{aligned} {\mathcal {Z}} \prec \frac{1}{\sqrt{N}q}. \end{aligned}$$
(2.5)

We have the following elementary result about stochastic domination.

Lemma 2.4

  1. (i)

    If \(X_1 \prec Y_1\) and \(X_2 \prec Y_2\) then \(X_1 X_2 \prec Y_1 Y_2\).

  2. (ii)

    Suppose that X is a nonnegative random variable satisfying \(X \leqslant N^C\) and \(X \prec \Phi \) for some deterministic \(\Phi \geqslant N^{-C}\). Then \({\mathbb {E}}X \prec \Phi \).

Fix (a small) \(c>0\) and define the spectral domains

$$\begin{aligned}&{{\mathbf {S}}} :=\{z=E + \mathrm {i}\eta :|E |\leqslant 10, N^{-1 + c}\leqslant \eta \leqslant 10\}, \nonumber \\&\widetilde{{{\mathbf {S}}}}:=\{z=E+\mathrm {i}\eta :|E|\leqslant 10, 0<\eta \leqslant 10\}. \end{aligned}$$
(2.6)

We recall the local semicircle law for Erdős–Rényi graphs from [9].

Proposition 2.5

(Theorem 2.8, [9]). Let H be a sparse matrix defined as in Definition 1.1, and \(m_{\mathrm {sc}}\) be the Stieltjes transform of the semicircle distribution. We have

$$\begin{aligned} \max \limits _{i,j}|G_{ij}(z)-\delta _{ij}m_{\mathrm {sc}}(z)| \prec \frac{1}{q}+\sqrt{\frac{1}{N\eta }} \end{aligned}$$

uniformly in \(z = E+\mathrm {i}\eta \in {{\mathbf {S}}}\).

As a standard consequence of the local law, we have the complete delocalization of eigenvectors.

Lemma 2.6

Let \({{\mathbf {u}}}_1,\ldots ,{{\mathbf {u}}}_N\) be the (\(L^2\)-normalized) eigenvectors of H. We have

$$\begin{aligned} {{\mathbf {u}}}_i(k)^2 \prec \frac{1}{N} \end{aligned}$$

uniformly for all \(i,k \in \{1,2,\ldots ,N\}\).

Remark 2.7

Proposition 2.5 was proved in [9] under the additional assumption \({\mathbb {E}} H^2_{ii}=1/N\) for all i. However, the proof is insensitive to the variance of the diagonal entries, and one can easily repeat the steps in [9] under the general assumption \({\mathbb {E}} H_{ii}^2=C_i/N\). A weak local law for H with general variances on the diagonal can also be found in [15].

Lemma 2.8

(Ward identity). We have

$$\begin{aligned} \sum _j |G_{ij}|^2 =\frac{{{\,\mathrm{Im}\,}}G_{ii}}{\eta } \end{aligned}$$

for all \(z=E+\mathrm {i}\eta \in {{\mathbf {S}}}\).

The following Lemmas 2.92.12 characterize the asymptotic eigenvalues density of H. The proof of the following result is postponed to Sect. 5.

Lemma 2.9

There exists a deterministic polynomial

$$\begin{aligned} P_0(z,x)=1+zx+x^2+\frac{a_2}{q^2}x^4+\frac{a_3}{q^4}x^6+\cdots \end{aligned}$$

of degree \(2\lceil {\beta ^{-1}} \rceil \) such that

$$\begin{aligned} {\mathbb {E}} P_0(z,{\underline{G}} \,(z))\prec \frac{{\mathbb {E}}{{\,\mathrm{Im}\,}}{\underline{G}} \,(z)}{N\eta }+\frac{1}{N} \end{aligned}$$

uniformly for all deterministic \(z \in {{\mathbf {S}}}\). Here \(a_2, a_3, \ldots \) are real, deterministic, and bounded. They depend on the law of H.

Lemma 2.9 states that when x is replaced with \({{\underline{G}} \,}(z)\), the expectation of \(P_0(z,x)\) is very small. This is because of a cancellation built into P, which however holds only in expectation and not with high probability. The following two results are essentially proved in [19, Propositions 2.5–2.6], and we state them without proof. We denote by \({\mathbb {C}}_+\) the complex upper half-plane.

Lemma 2.10

There exists a deterministic algebraic function \(m_0:{\mathbb {C}}_+ \rightarrow {\mathbb {C}}_+\) satisfying \(P_0(z,m_0(z))=0\), such that \(m_0\) is the Stieltjes transform of a deterministic symmetric probability measure \(\varrho _0\). We have \({{\,\mathrm{supp}\,}}\varrho _0=[-L_0,L_0]\), where

$$\begin{aligned} L_0=2+O(1/q^2). \end{aligned}$$

Moreover,

$$\begin{aligned} {{\,\mathrm{Im}\,}}m_0(z)\asymp {\left\{ \begin{array}{ll} \sqrt{\tau _0+\eta } &{}\quad {if} \;\, E \in [-L_0,L_0] \\ \frac{\eta }{\sqrt{\tau _0+\eta }} &{}\quad {if} \;\, E \notin [-L_0,L_0], \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} |\partial _2 P_0(z,m_0(z))| \asymp \sqrt{\tau _0+\eta }, \quad \quad |\partial ^2_2 P_0(z,m_0(z))|= 2+O\big (q^{-2}\big ) \end{aligned}$$

for all \(z \in {\widetilde{{{{\mathbf {S}}}}}}\), where \(\tau _0 \equiv \tau _0(z):=|E^2-L_0^2|\).

Next, define \(P(z,x):=P_0(z,x)+{\mathcal {Z}} x^2\).

Lemma 2.11

There exists a random algebraic function \(m:{\mathbb {C}}_+ \rightarrow {\mathbb {C}}_+\) satisfying \(P(z,m(z))=0\), such that m is the Stieltjes transform of a random symmetric probability measure \(\varrho \). We have \({{\,\mathrm{supp}\,}}\varrho =[-L,L]\), where

$$\begin{aligned} L=L_0+{\mathcal {Z}}+O_{\prec }\Big (\frac{1}{\sqrt{N}q^3}\Big ). \end{aligned}$$

Moreover,

$$\begin{aligned} {{\,\mathrm{Im}\,}}m(z)\asymp {\left\{ \begin{array}{ll} \sqrt{\tau +\eta } &{}\quad \text {if} \quad E \in [-L,L] \\ \frac{\eta }{\sqrt{\tau +\eta }} &{} \quad \text {if} \quad E \notin [-L,L], \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} |\partial _2 P(z,m(z))| \asymp \sqrt{\tau +\eta }, \quad \quad |\partial ^2_2 P(z,m(z))|= 2+O\big (q^{-2}\big ) \end{aligned}$$
(2.7)

for all \(z \in {\widetilde{{{{\mathbf {S}}}}}}\), where \(\tau \equiv \tau (z) :=|E^2-L^2|\).

Let \(\gamma _i\) denote the ith N-quantile of \(\varrho \), i.e.

$$\begin{aligned} \int _{-L}^{\gamma _i} \varrho (x)\,\mathrm {d}x =\frac{i}{N}. \end{aligned}$$

Similarly, let \(\gamma _{0,i}\) and \(\gamma _{\mathrm {sc},i}\) denote the ith N-quantile of \(\varrho _0\) and the semicircle distribution respectively. We have the following result, whose proof is given in “Appendix A” below.

Lemma 2.12

We have

$$\begin{aligned} \gamma _i =\gamma _{0,i} +\frac{\gamma _{\mathrm {sc},i}}{2}{\mathcal {Z}} +O_{\prec }\left( \frac{1}{\sqrt{N}q^3}\right) \end{aligned}$$

uniformly for \(i \in \{1,2,\ldots ,N\}\). Here \(\gamma _{0,i}\) is deterministic and satisfies \(\gamma _{0,i}=\gamma _{\mathrm {sc},i}+O(q^{-2})\).

3 Outline of the proof

In this section we describe the strategy of the proof. The foundation of the proof is the method of recursive self-consistent estimates for high moments using the cumulant expansion introduced in [13], building on the previous works [5, 6, 20]. It was first used to study sparse matrices in [24], which also introduced the important idea of estimating moments of a self-consistent polynomial in the trace of the Green’s function. There, the authors derived a precise local law near the edge and obtained the extreme eigenvalue fluctuations for \(p \gg N^{-2/3}\). Subsequently, in [19], by developing the key insight that for \(N^{-7/9} \ll p \ll N^{-2/3}\) the leading fluctuations are fully captured by the random variable \({\mathcal {Z}}\) from (1.4), the authors obtained the extreme eigenvalue fluctuations for \(N^{-7/9}\ll p \ll N^{-2/3}\). In this paper we use the same basic strategy as [19, 24]. As in most results on the extreme eigenvalue statistics, the main difficulty is to establish rigidity bounds for the extreme eigenvalues.

The proof of Theorem 1.2 consists of essentially two separate results: an upper bound on the largest eigenvalue of H (Proposition 4.1 below) and a rigidity estimate in the bulk (Lemma 4.2 below). The latter is a modification of [19, Proposition 2.9], and our main task is to show the former.

We use the random spectral parameter \(z=L_0+{\mathcal {Z}} +w\) introduced in [19], where \(w=\kappa +\mathrm {i}\eta \) is deterministic. In order to obtain the estimate of Proposition 4.1 for the largest eigenvalue of H using the Green’s function, one has to preclude the existence of an eigenvalue near \({{\,\mathrm{Re}\,}}z\) for a suitable z, which follows provided one can show

$$\begin{aligned} {{\,\mathrm{Im}\,}}{\underline{G}} \, \ll \frac{1}{N\eta } \end{aligned}$$
(3.1)

[see (6.5) and the discussions afterwards for more details]. The proof of (3.1) is the main work of our proof. It relies on the following key new ideas.

  1. 1.

    In the previous works [19, 24], following the work [10] on Wigner matrices, (3.1) is always proved using

    $$\begin{aligned} {{\,\mathrm{Im}\,}}{\underline{G}} \,\leqslant {{\,\mathrm{Im}\,}}m + |{\underline{G}} \,-m| \end{aligned}$$

    and estimating the two terms on right-hand side separately. There, the term \(|{\underline{G}} \,-m|\) is estimated by obtaining an estimate on \(|P(z, {{\underline{G}} \,}) |\) from which an estimate on \(|{\underline{G}} \,-m|\) follows by inverting a self-consistent equation associated with the polynomial P. In our current setting, \(|{{\underline{G}} \,} - m |\) turns out to be much larger than \({{\,\mathrm{Im}\,}}{\underline{G}} \,\) and hence this approach does not work. Thus, we have to estimate \(|{{\,\mathrm{Im}\,}}({{\underline{G}} \,} - m)|\) instead of \(|{\underline{G}} \,-m|\) and take advantage of the fact that it is much smaller than \(|{\underline{G}} \,-m|\). To that end, we first estimate \(|{{\,\mathrm{Im}\,}}P(z, {{\underline{G}} \,}) |\) by exploiting a crucial cancellation arising from taking the imaginary part, which yields stronger bounds on \(|{{\,\mathrm{Im}\,}}P(z,{{\underline{G}} \,}) |\) than are possible for \(|P(z, {{\underline{G}} \,}) |\).

  2. 2.

    To estimate \(|{{\,\mathrm{Im}\,}}({{\underline{G}} \,} - m) |\) from \(|{{\,\mathrm{Im}\,}}P(z, {{\underline{G}} \,}) |\), we have to invert a self-consistent equation associated with \({{\,\mathrm{Im}\,}}P\). This equation is only stable provided that \(|{{\underline{G}} \,} - m |\) is small enough.

  3. 3.

    The main work is to derive a strong enough bound on \(|{{\underline{G}} \,} - m |\) to ensure the stability of the self-consistent equation for \({{\,\mathrm{Im}\,}}({{\underline{G}} \,} - m)\). The precision required for this step is much higher than that obtained in [19]. Our starting point is the same as in [19, 24]: estimating high moments \({\mathbb {E}}|P |^{2n}\) of \(P \equiv P(z,{\underline{G}} \,)\) using the cumulant expansion. Note that P is constructed in such a way that the expectation \({\mathbb {E}}P(z, {{\underline{G}} \,})\) is very small by a near-exact cancellation (see Lemma 2.9). In the high moments, the interactions between different factors of P and \({{\overline{P}} \,}\), corresponding to the fluctuations of P, give rise to error terms whose control is the key difficulty of the proof. They cannot be estimated naively and have to be re-expanded to arbitrarily high order using a recursive application of the cumulant expansion. These error terms typically contain the partial derivative \(\partial _2 P\) of P in the second argument \({{\underline{G}} \,}\). As soon as P is differentiated, the cancellation built into P is lost. However, we nevertheless need to exploit remnants of this cancellation that are inherited by these higher-order terms containing derivatives of P. We track them by rewriting the partial derivative \(\partial _2 P\) in terms of the derivative \(\partial _w P = \partial _1 P + \partial _2 P \partial _w {{\underline{G}} \,}\) and an error term, and then use that \(\partial _w\) commutes with the derivative \(\frac{\partial }{\partial H_{ij}}\) from the cumulant expansion to obtain a form where the cancellation from the next cumulant expansion is obvious also for the derivative of P.

Let us explain the above points in more detail. The proof of (3.1) contains two steps. The main step is to bound the high moments of P in Proposition 6.1. We start with

$$\begin{aligned} {\mathbb {E}} |P|^{2n}=\frac{1}{N}\sum _{i,j} {\mathbb {E}} H_{ij}G_{ji}P^{n-1}P^{*n}+{\mathbb {E}} (P-{\underline{HG}} \,) P^{n-1}P^{*n}. \end{aligned}$$

We expand the first term on the right-hand side by Lemma 2.1 to get

$$\begin{aligned} {\mathbb {E}} |P|^{2n}&=\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s( P^{n-1}P^{*n})}{\partial H_{ij}^s} \frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}\bigg ] \nonumber \\&\quad +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k} P^{n-1}P^{*n}\bigg ] \nonumber \\&\quad +{\mathbb {E}} (P-{\underline{HG}} \,) P^{n-1}P^{*n}+\text {error}. \end{aligned}$$
(3.2)

Note that the polynomial P is designed such that

$$\begin{aligned} {\mathbb {E}} P= & {} \frac{1}{N}\sum _{i,j} {\mathbb {E}} H_{ij}G_{ji}+{\mathbb {E}} (P-{\underline{HG}} \,)\\= & {} \frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k}\bigg ] \\&+\,{\mathbb {E}} (P-{\underline{HG}} \,) +\text {error}\approx 0, \end{aligned}$$

and for the same reason there are cancellations between the second and third terms on right-hand side of (3.2). It turns out that the most dangerous terms on right-hand side of (3.2) are contained within the first sum. One representative error term, arising from \(k = 3\) and \(s = 2\) in (3.2), is

$$\begin{aligned} \frac{1}{N}\sum _{i,j} {\mathcal {C}}_{4}(H_{ij}) {\mathbb {E}} \Big [ (\partial _2 {{\overline{P}} \,}) N^{-1}(G^{*2})_{ii}G^*_{jj} G_{ii}G_{jj}|P|^{2n-2}\Big ], \end{aligned}$$
(3.3)

which involves the interaction of P and \({{\overline{P}} \,}\) and hence depends on the fluctuations of P.

To get a sharp enough estimate of (3.3), it is not enough to take absolute value inside the expectation and then estimate \(|\partial _2 {{\overline{P}} \,}|\) and \(|N^{-1}(G^{*2})_{jj}|\) by Lemmas 2.11 and 2.8 respectively. Instead, the key idea is to rewrite the error term, so that it becomes amenable to another expansion step, asFootnote 1

$$\begin{aligned} (\partial _2{{\overline{P}} \,}) N^{-1}(G^{*2})_{ii}G^*_{jj}G_{ii}G_{jj}=N^{-1} \partial _{{{\overline{w}} \,}} P({{\overline{z}} \,} ,{\underline{G}} \,^*) {\underline{G^*}} \,\,{\underline{G}} \,^2+\text {error}, \end{aligned}$$
(3.4)

which comes from the approximations

$$\begin{aligned} (G^{*2})_{ii}\approx {\underline{G^{*2}}} \,, \quad G^*_{jj} \approx {\underline{G^*}} \,, \quad G_{ii},G_{jj} \approx {\underline{G}} \,, \quad \text {and}\;\, (\partial _2{{\overline{P}} \,}) {\underline{G^{*2}}} \,= \partial _{{{\overline{w}} \,}} P({{\overline{z}} \,},{\underline{G}} \,^*) -{\underline{G^*}} \, \end{aligned}$$

which of course have to be justified. Ignoring the error terms generated in this process, we find that (3.3) is reduced to

$$\begin{aligned}&\frac{1}{N^2}\sum _{i,j}{\mathcal {C}}_4(H_{ij}){\mathbb {E}}\Big [ \partial _{{{\overline{w}} \,}} P({{\overline{z}} \,},{\underline{G}} \,^*) {\underline{G^*}} \,\,{\underline{G}} \,^2|P|^{2n-2}\Big ]\\&\quad =\frac{1}{N^3}\sum _{i,j,k,l}{\mathcal {C}}_4(H_{ij}){\mathbb {E}} \Big [ \partial _{{{\overline{w}} \,}} (H_{kl}G^{*}_{lk}) {\underline{G^*}} \,\,{\underline{G}} \,^2|P|^{2n-2}\Big ] \\&\qquad +\frac{1}{N^2}\sum _{i,j}{\mathcal {C}}_4(H_{ij}){\mathbb {E}}\Big [ \partial _{{{\overline{w}} \,}} ( {{\overline{P}} \,}-{\underline{HG^*}} \,) {\underline{G^*}} \,\,{\underline{G}} \,^2|P|^{2n-2}\Big ]. \end{aligned}$$

Since \(\partial _{{{\overline{w}} \,}}\) and \(\partial /\partial H_{ij}\) commute, we can again expand the first term on the right-hand side with Lemma 2.1. In this way the operator \(\partial _{{{\overline{w}} \,}}\) plays no role in our computation, and we can get the desired estimate using the smallness of \({\mathbb {E}} {{\overline{P}} \,}\).

A major difficulty in the above argument results from the fact that we need to track carefully the algebraic structure of the error terms arising from repeated applications of simplifications of the form (3.4). In particular, such terms occur inside expectations multiplying lots of other terms, and we need to ensure that such approximations remain valid in general expressions. In order to achieve this, we implement the ideas in [12, 14] to construct a hierarchy of Schwinger–Dyson equations for a sufficiently large class of polynomials in the entries of the Green’s function.

A desired bound for P, Proposition 6.1, together with the stability analysis of the self-consistent equation associated with P (Lemma 6.2 below), yields the key estimate

$$\begin{aligned} |{\underline{G}} \,-m |\ll \sqrt{\kappa }, \end{aligned}$$
(3.5)

where we recall that \({{\,\mathrm{Re}\,}}z = L_0 + {\mathcal {Z}} + \kappa \). This estimate is crucial in establishing the stability of the self-consistent equation associated with \({{\,\mathrm{Im}\,}}P\) (see Lemma 6.4). More precisely, a Taylor expansion shows

$$\begin{aligned} P(z,{\underline{G}} \,)=\partial _2 P(z,m)({\underline{G}} \,-m)+\frac{1}{2}\partial _2^2 P(z,m)({\underline{G}} \,-m)^2 +\cdots . \end{aligned}$$

As \(\partial _2^2 P(z,m)\approx 2\), taking the imaginary part and rearranging terms yields

$$\begin{aligned} {{\,\mathrm{Re}\,}}\partial _2 P(z,m) {{\,\mathrm{Im}\,}}({\underline{G}} \,-m)&={{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,)-{{\,\mathrm{Im}\,}}\partial _2 P(z,m) {{\,\mathrm{Re}\,}}({\underline{G}} \,-m) \nonumber \\&\quad -2 {{\,\mathrm{Re}\,}}({\underline{G}} \,-m){{\,\mathrm{Im}\,}}({\underline{G}} \,- m)+\cdots . \end{aligned}$$
(3.6)

It can be showed that \(|{{\,\mathrm{Re}\,}}\partial _2 P(z,m) | \asymp \sqrt{\kappa }\), and we move this factor to the right-hand side of (3.6) to obtain a recursive estimate of \({{\,\mathrm{Im}\,}}({\underline{G}} \,-m)\). The third term on right-hand side of (3.6) says that in order for this estimate to work, we need

$$\begin{aligned} |{\underline{G}} \,-m| \ll |{{\,\mathrm{Re}\,}}\partial _2 P(z,m) | \asymp \sqrt{\kappa }, \end{aligned}$$

which is exactly (3.5).

The final step in showing (3.2) is to bound the high moments of \({{\,\mathrm{Im}\,}}P\) in Proposition 6.3. As \({{\,\mathrm{Im}\,}}P\) is much smaller than P near the edge, compared to \({\mathbb {E}} |P|^{2n}\), we obtain a much smaller bound for \({\mathbb {E}} |{{\,\mathrm{Im}\,}}P |^{2n}\). The proof is similar to that of Proposition 6.1, but contains significantly fewer expansions. Combining Proposition 6.3 and Lemma 6.4 leads to our desired estimate of \({{\,\mathrm{Im}\,}}G\), which is

$$\begin{aligned} |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m| \prec \frac{1}{N^{1+\delta }\eta } . \end{aligned}$$

As we prove the above for z satisfying \({{\,\mathrm{Im}\,}}m \ll \frac{1}{N\eta }\), we get (3.1) as desired.

4 Proof of Theorem 1.2

In this section we prove Theorem 1.2. The key result is the following upper bound on the largest eigenvalue of H. The proof is postponed to Sect. 6.

Proposition 4.1

Denoting by \(\mu _N\) the largest eigenvalue of H, we have

$$\begin{aligned} ( \mu _{N}-L_0-{\mathcal {Z}} )_+\prec \frac{1}{N^{1/2+\delta }q}. \end{aligned}$$
(4.1)

We also need the following result to estimate the eigenvalues away from the spectral edges. The proof is postponed to Sect. 9.

Lemma 4.2

Let \(\rho \) denote the empirical eigenvalue density of H, and set

$$\begin{aligned} I_1:= & {} \left[ -\frac{1}{2},L_0+{\mathcal {Z}}-\frac{2}{N^{1/2+\delta }q}\right] ,\\ I_2:= & {} \left[ L_0+{\mathcal {Z}}-\frac{2}{N^{1/2+\delta }q},L_0+{\mathcal {Z}}+\frac{2}{N^{1/2+\delta }q}\right] . \end{aligned}$$

We have

$$\begin{aligned} |\rho (I)-\varrho (I)| \prec \frac{1}{N}+\sqrt{\frac{|I|}{Nq^3}} \end{aligned}$$
(4.2)

for all \(I \subset I_1\) and \(I=I_2\).

Proof of Theorem 1.2

We prove (1.8) for \(i \in \{\lfloor {N/2} \rfloor -1,\ldots ,N-1\}\), and the same analysis works for the other half of the spectrum. Let \(i \in \{\lfloor {N/2} \rfloor -1,\ldots ,N-1\}\) and suppose first that

$$\begin{aligned} \gamma _i, \lambda _i \geqslant L_0+{\mathcal {Z}}-\frac{2}{N^{1/2+\delta }q}. \end{aligned}$$
(4.3)

Then trivially we have \(\gamma _i \in I_2\) with very high probability. In addition, by the Cauchy interlacing theorem we have \(\lambda _i \leqslant \mu _N\), and together with Proposition 4.1 we obtain

$$\begin{aligned} (\lambda _i-L_0-{\mathcal {Z}})_+ \prec \frac{1}{N^{1/2+\delta }q}. \end{aligned}$$

Thus by the triangle inequality we get

$$\begin{aligned} \lambda _i-\gamma _i \prec \frac{1}{N^{1/2+\delta }q}. \end{aligned}$$
(4.4)

Next, suppose (4.3) does not hold, namely

$$\begin{aligned} \min \{\gamma _i, \lambda _i\}=L_0 + {\mathcal {Z}} -\frac{2}{N^{1/2+\delta }q}-a \end{aligned}$$

for some \(a\in (0,3)\). Let \(\nu \) be the empirical eigenvalue density of A. By the Cauchy interlacing theorem,

$$\begin{aligned} |\rho (I) -\nu (I)| \leqslant \frac{1}{N} \end{aligned}$$

for any \(I \subset {\mathbb {R}}\). Together with (4.2), we have

$$\begin{aligned} |\nu (I)-\varrho (I)| \prec \frac{1}{N}+\sqrt{\frac{|I|}{Nq^3}} \end{aligned}$$
(4.5)

for all \(I \subset I_1\) or \(I=I_2\). Let \(f(E):=\varrho ([E,\infty ))\). Then

$$\begin{aligned} f(\gamma _i)=\frac{N+1-i}{N}=\nu ((\lambda _i,\infty ])=f(\lambda _i)+O_{\prec }\left( \frac{1}{N}+\sqrt{\frac{|I_2|+a}{Nq^3}}\,\right) , \end{aligned}$$

where in the last step we used (4.5). By the definition of \(I_2\) we get \( |f(\gamma _i)-f(\lambda _{i})| \prec N^{-\delta }(|I_2|+a)^{3/2}.\) Together with the uniform square root behaviour of the density of \(\varrho \) near L from Lemma 2.11 we therefore have

$$\begin{aligned} f(\lambda _i) \vee f(\gamma _i) \geqslant c (|I_2|+a)^{3/2}\geqslant N^{\delta } |f(\gamma _i)-f(\lambda _i)| \end{aligned}$$

with very high probability, where \(c > 0\) is a constant. Thus

$$\begin{aligned} f(\gamma _i)=f(\lambda _i)(1+O(N^{-\delta })) \end{aligned}$$

with very high probability. Since \(f(x)\asymp (L-x)^{3/2}\) for \(x \in I_1\), we deduce that \(L-\gamma _i\asymp L-\lambda _{i}\) with very high probability. Moreover, by Lemma 2.11 we have \(f'(x)\asymp (L-x)^{1/2}\) for \(x \in I_1\), which implies \(f'(\lambda _i)\asymp f'(\gamma _i)\) with very high probability, and hence that \(f'(x)\asymp f'(\gamma _i)\) with very high probability for any x between \(\lambda _{i}\) and \(\gamma _i\). Thus the mean value theorem yields

$$\begin{aligned} |\lambda _i-\gamma _i|\asymp \frac{|f(\lambda _i)-f(\gamma _i)|}{f'(\gamma _i)}\prec \frac{1}{N\sqrt{|I_2|+a}}+\frac{1}{\sqrt{N}q^{3/2}} \prec \frac{1}{\sqrt{N}q^{3/2}}. \end{aligned}$$

Using the above relation, together with (4.4) and Lemma 2.12, we conclude that

$$\begin{aligned} \lambda _i-\gamma _{0,i}-\frac{\gamma _{\mathrm {sc},i}}{2}{\mathcal {Z}} \prec \frac{1}{N^{1/2+\delta }q}. \end{aligned}$$

We then take the expectation using Lemma 2.4, which yields

$$\begin{aligned} {\mathbb {E}} \lambda _i -\gamma _{0,i} \prec \frac{1}{N^{1/2+\delta }q}. \end{aligned}$$

Combining the above two formulas we have (1.8) as desired. \(\square \)

5 Abstract polynomials and the construction of \(P_0\)

Convention

Throughout this section, \(z \in {{\mathbf {S}}}\) is deterministic.

In this section we construct the polynomial \(P_0\) and prove Lemma 2.9. It was essentially proved in [19, Proposition 2.9]; here we follow a more systematic approach, based on a class of abstract polynomials in the Green’s function entries, which provides an explicit proof. We shall generalize this class further in Sect. 7.

5.1 Abstract polynomials, part I

We start by introducing a notion of formal monomials in a set of formal variables, which are used to construct \(P_0\). Here the word formal refers to the fact that these definitions are purely algebraic and we do not assign any values to variables or monomials.

Definition 5.1

Let \(\{i_1,i_2,\ldots \}\) be an infinite set of formal indices. To \(\sigma , \nu _1 \in {\mathbb {N}}\), \(\theta \in {\mathbb {R}}\), \(x_1, y_1, \ldots , x_\sigma , y_\sigma \in \{i_1, \ldots , i_{\nu _1}\}\), and a family \((a_{i_1,\ldots ,i_{\nu _1}})_{1\leqslant i_1,\ldots ,i_{\nu _1}\leqslant N}\) of uniformly bounded complex numbers we assign a formal monomial

$$\begin{aligned} T = a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta } G_{x_1 y_1} \ldots G_{x_\sigma y_\sigma }. \end{aligned}$$
(5.1)

We denote \(\sigma (T) = \sigma \), \(\nu _1(T) = \nu _1\), \(\theta (T) = \theta \), and \(\nu _2(T) :=\sum _{k = 1}^\sigma {{\mathbf {1}}}_{x_k \ne y_k}\). Thus, \(\sigma (T)\) is the degree of T and \(\nu _2(T)\) is the number of off-diagonal Gs. We denote by \({\mathcal {T}}\) the set of formal monomials T of the form (5.1).

Definition 5.2

We assign to each monomial \(T \in {\mathcal {T}}\) with \(\nu _1 = \nu _1(T)\) its evaluation

$$\begin{aligned} T_{i_1,\ldots ,i_{\nu _1}} \equiv T_{i_1,\ldots ,i_{\nu _1}}(z), \end{aligned}$$

which is a random variable depending on an \(\nu _1\)-tuple \((i_1,\ldots ,i_{\nu _1})\in \{1,2,\ldots ,N\}^{\nu _1}\). It is obtained by replacing, in the formal monomial T, the formal indices \(i_1,\ldots ,i_{\nu _1}\) with the integers \(i_1,\ldots ,i_{\nu _1}\) and the formal variables \(G_{xy}\) with elements \(G_{xy}\) of the Green’s function (2.1) with parameter z. We define

$$\begin{aligned} {\mathcal {S}} (T) :=\sum _{i_1,\ldots ,i_{\nu _1}} T_{i_1,\ldots ,i_{\nu _1}}. \end{aligned}$$
(5.2)

Defining the random variable

$$\begin{aligned} \Gamma \equiv \Gamma (z):=\frac{ {{\,\mathrm{Im}\,}}{\underline{G}} \,(z)}{N\eta }. \end{aligned}$$
(5.3)

we have the following result, whose proof is given in Sect. 10.1 below.

Lemma 5.3

For any fixed \(T \in {\mathcal {T}}\) we have

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T) \prec N^{\nu _1(T)-\theta (T)}\big (\delta _{ 0 \nu _2(T)}+ {\mathbb {E}}\Gamma +N^{-1}\big ) . \end{aligned}$$
(5.4)

Remark 5.4

When \(\nu _2(T)\ne 1\), Lemma 5.3 is a straightforward consequence of Lemma 2.8 and Proposition 2.5. When \(\nu _2(T)=1\), naively applying the Ward identity shows

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T) \prec N^{\nu _1(T)-\theta (T)}{\mathbb {E}}\sqrt{\Gamma }. \end{aligned}$$

In this case, therefore, Lemma 5.3 extracts an additional factor of \(\sqrt{\Gamma }\).

In the sequel we also need the subset

$$\begin{aligned} {\mathcal {T}}_0 :=\{T\in {\mathcal {T}} :\nu _2(T)=0\} \end{aligned}$$

of formal monomials without off-diagonal entries. We define an averaging map \({\mathcal {M}}\) from \({\mathcal {T}}_0\) to the space of random variables through

$$\begin{aligned} {\mathcal {M}}(T)=\sum _{i_1,\ldots ,i_{\nu _1}}a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta } {\underline{G}} \,^{\sigma }, \end{aligned}$$
(5.5)

for \(T =a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }G_{x_1x_1}G_{x_2x_2}\ldots G_{x_{\sigma }x_{\sigma }} \in {\mathcal {T}}_0\). The interpretation of \({\mathcal {M}}(T)\) is that it replaces all diagonal entries of G in T by their average \({{\underline{G}} \,}\) and then applies \({\mathcal {S}}\). Note that it is only applied to monomials \(T \in {\mathcal {T}}_0\) without off-diagonal entries. The following result is proved in Sect. 10.2 below.

Lemma 5.5

For any fixed \(T \in {\mathcal {T}}_0\) there exists \(k \in {\mathbb {N}}\) and \(T^{(1)},\ldots ,T^{(k)} \in {\mathcal {T}}_0\) such that

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T) ={\mathbb {E}} {\mathcal {M}}(T)+\sum _{l=1}^k{\mathbb {E}}\, {\mathcal {S}}\big (T^{(l)}\big )+O_{\prec }\big (N^{\nu _1(T)-\theta (T)}\big ({\mathbb {E}} \Gamma +N^{-1}\big )\big ). \end{aligned}$$
(5.6)

Each \(T^{(l)}\) satisfies \(\sigma (T^{(l)})-\sigma (T) \in 2{\mathbb {N}}+4\),

$$\begin{aligned} \nu _1\big (T^{(l)}\big )=\nu _1(T)+1, \quad \text {and} \quad \theta \big (T^{(l)}\big )=\theta (T)+1+\beta \big (\sigma \big (T^{(l)}\big )-\sigma (T)-2\big ). \end{aligned}$$

Lemma 5.5 leads to the following result.

Lemma 5.6

Fix \(T \in {\mathcal {T}}_0\). Fix \(r \in {\mathbb {N}}_+\). Then there exists deterministic and bounded \(b_1,\ldots ,b_{r}\) such that

$$\begin{aligned} {\mathcal {M}}(r,T):={\mathcal {M}}(T)+N^{\nu _1(T)-\theta (T)}\sum _{l=2}^{r} b_{l} N^{-l\beta } {\underline{G}} \,^{\sigma (T)+2l} \end{aligned}$$
(5.7)

satisfies

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(T)={\mathbb {E}} {\mathcal {M}}(r, T) +O_{\prec }\big (N^{\nu _1(T)-\theta (T)}\big ({\mathbb {E}} \Gamma +N^{-1}+N^{-\beta (r+1)}\big )\big ). \end{aligned}$$

Proof

When \(r=1\), the Lemma is trivially true from Lemma 5.5. When \(r \geqslant 2\), the proof is essentially a repeated use of Lemma 5.5. More precisely, by Lemma 5.5,

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T) ={\mathbb {E}} {\mathcal {M}}(T)+\sum _{l=1}^k{\mathbb {E}}\, {\mathcal {S}}(T^{(l)})+O_{\prec }\big (N^{\nu _1(T)-\theta (T)}\big ({\mathbb {E}} \Gamma +N^{-1}\big )\big ) \end{aligned}$$
(5.8)

for some fixed \(k \in {\mathbb {N}}\), where each \(T^{(l)}\) satisfies \(\sigma (T^{(l)})-\sigma (T) \in 2{\mathbb {N}}+4\), \(\nu _1(T^{(l)})=\nu _1(T)+1\) and \(\theta (T^{(l)})=\theta (T)+1+\beta (\sigma (T^{(l)})-\sigma (T)-2)\). As a result, \({\mathbb {E}} {\mathcal {S}}(T^{(l)})=O_{\prec }\big (N^{\nu _1(T)-\theta (T)-2\beta }\big )\) for each l. Now we apply Lemma 5.5 to each \(T^{(l)}\) on RHS of (5.8), and get

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \,{\mathcal {S}}\big (T^{(l)}\big )&={\mathbb {E}} {\mathcal {M}}\big (T^{(l)}\big )+\sum _{l_1=1}^{k_l}{\mathbb {E}}\, {\mathcal {S}}\big (T^{(l,l_1)}\big )+O_{\prec }\big (N^{\nu _1\big (T^{(l)}\big )-\theta \big (T^{(l)}\big )}\big ({\mathbb {E}} \Gamma +N^{-1}\big )\big )\\&={\mathbb {E}} {\mathcal {M}}\big (T^{(l)}\big )+\sum _{l_1=1}^{k_l}{\mathbb {E}}\, {\mathcal {S}}\big (T^{(l,l_1)}\big )+O_{\prec }\big (N^{\nu _1(T)-\theta (T)}\big ({\mathbb {E}} \Gamma +N^{-1}\big )\big ) \end{aligned} \end{aligned}$$
(5.9)

for some fixed \(k_l \in {\mathbb {N}}\), where each \(T^{(l,l_1)}\) satisfies \(\sigma (T^{(l,l_1)})-\sigma (T) \in 2{\mathbb {N}}+8\), \(\nu _1(T^{(l,l_1)})=\nu _1(T)+2\) and \(\theta (T^{(l)})=\theta (T)+2+\beta (\sigma (T^{(l,l_1)})-\sigma (T)-4)\). Moreover, by our conditions on \(\theta (T^{(l)})\), \(\nu _1(T^{(l)})\) and \(\theta (T^{(l)})\), we can write

$$\begin{aligned} \sum _{l=1}^{k}{\mathbb {E}} {\mathcal {M}}\big ({T^{(l)}}\big )=N^{\nu _1(T)-\theta (T)}\sum _{l=2}^{r} b_{l,1} N^{-l\beta } {\mathbb {E}}{\underline{G}} \,^{\sigma (T)+2l} \end{aligned}$$
(5.10)

for some deterministic and bounded \(b_{l,1},\ldots ,b_{r,1}\). Combining (5.8)–(5.10), we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T) =&\,{\mathbb {E}} {\mathcal {M}}(T)+N^{\nu _1(T)-\theta (T)}\sum _{l=2}^{r} b_{l,1} N^{-l\beta } {\mathbb {E}}{\underline{G}} \,^{\sigma (T)+2l}\\&+\sum _{l=1}^k\sum _{l_1=1}^{k_l}{\mathbb {E}}\, {\mathcal {S}}\big (T^{(l,l_1)}\big )+O_{\prec }\big (N^{\nu _1(T)-\theta (T)}\big ({\mathbb {E}} \Gamma +N^{-1}\big )\big ). \end{aligned} \end{aligned}$$
(5.11)

Note that \({\mathbb {E}} {\mathcal {S}}(T^{(l,l_1)})=O_{\prec }\big (N^{\nu _1(T)-\theta (T)-4\beta }\big )\) for each \((l,l_1)\). Thus we can again apply Lemma 5.5 to each \(T^{(l,l_1)}\) on RHS of (5.11). Repeating the above steps finitely many times completes the proof. \(\square \)

Note that we in particular have \({\mathcal {M}}(1,T)={\mathcal {M}}(T)\) through (5.7).

5.2 The construction of \(P_0\) and Proof of Lemma 2.9

We compute

$$\begin{aligned} {\mathbb {E}} (1+z{\underline{G}} \,)= {\mathbb {E}}{\underline{HG}} \,=\frac{1}{N}\sum _{i,j} {\mathbb {E}} H_{ij}G_{ji}, \end{aligned}$$

and we shall find a polynomial \(Q_0\) such that

$$ {\mathbb {E}} (1+z{\underline{G}} \,)+{\mathbb {E}} Q_0({\underline{G}} \,) \prec {\mathbb {E}} \Gamma +\frac{1}{N}. $$

We then set \(P_0(z,x)=1+zx+Q_0(x)\). Using Lemma 2.1 with \(h=H_{ij}\) and \(f=f_{ji}(H)=G_{ji}\), we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}} (1+z{\underline{G}} \,)&=\frac{1}{N}\sum _{k=1}^{\ell }\frac{1}{k!}\sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}}\frac{\partial ^k G_{ji}}{\partial H_{ij}^k}+\frac{1}{N}\sum _{i,j}{\mathbb {E}}{\mathcal {R}}^{(ji)}_{\ell +1}\\&=:\sum _{k=1}^{\ell }{\widetilde{X}}_k+\frac{1}{N}\sum _{i,j}{\mathbb {E}}{\mathcal {R}}^{(ji)}_{\ell +1}, \end{aligned} \end{aligned}$$
(5.12)

where \(\ell \) is a fixed positive integer to be chosen later, and \({\mathcal {R}}^{(ji)}_{\ell +1}\) is a remainder term defined analogously to \({\mathcal {R}}_{\ell +1}\) in (2.4). One can follow, e.g. the proof of Lemma 3.4 (iii) in [16], and readily check that

$$\begin{aligned} \frac{1}{N}\sum _{i,j}{\mathbb {E}}{\mathcal {R}}^{(ji)}_{\ell +1}=O(N^{-1}) \end{aligned}$$

for \(\ell \equiv \ell (\beta ) \) large enough. From now on, we always assume the remainder term in cumulant expansion is negligible.

Now let us look at each \({\widetilde{X}}_k\). For \(k=1\), by the differential rule (2.2) and \({\mathcal {C}}_{2}(H_{ij})=1/N\) for \(i \ne j\), we have

$$\begin{aligned} {\widetilde{X}}_1= & {} -\frac{1}{N^2}\sum _{i,j} {\mathbb {E}} \big (G_{ij}^2+G_{ii}G_{jj}\big )-\frac{1}{N^2}\sum _{i} (N{\mathcal {C}}_2(H_{ii})-2){\mathbb {E}} G_{ii}^2 \nonumber \\= & {} -{\mathbb {E}} {\underline{G}} \,^2+O_{\prec }\Big ({\mathbb {E}} \Gamma +\frac{1}{N}\Big ). \end{aligned}$$
(5.13)

For \(k=2\), the most dangerous term is

$$\begin{aligned} \frac{1}{N} \sum _{i,j} {\mathcal {C}}_3(H_{ij}){\mathbb {E}} G_{ij}G_{ii}G_{jj}=:\sum _{i,j}{\mathbb {E}} T_{ij}, \end{aligned}$$

and by \({\mathcal {C}}_3(H_{ij})=O(N^{-1-\beta })\), we see that \(\nu _1(T) = 2\), \(\theta (T) = 2 + \beta \), and \(\nu _2(T)=1\). Thus by Lemma 5.3 we have

$$\begin{aligned} \sum _{i,j} T_{ij} \prec N^{-\beta }\Big ({\mathbb {E}} \Gamma +\frac{1}{N}\Big ). \end{aligned}$$

Other terms in \({\widetilde{X}}_2\) also satisfy the same bound. Similar estimates can also be done for all even k, which yield

$$\begin{aligned} \sum _{s=1}^{\lceil {\ell /2} \rceil } {\widetilde{X}}_{2s} \prec {\mathbb {E}} \Gamma +\frac{1}{N}. \end{aligned}$$
(5.14)

For odd \(k \geqslant 3\), we split

$$\begin{aligned} {\widetilde{X}}_k={\widetilde{X}}_{k,1}+{\widetilde{X}}_{k,2}, \end{aligned}$$

where terms in \({\widetilde{X}}_{k,1}\) contain no off-diagonal entries of G. Use Lemma 5.3, we easily find

$$\begin{aligned} {\widetilde{X}}_{k,2} \prec {\mathbb {E}} \Gamma +\frac{1}{N}. \end{aligned}$$

By Lemma 2.2, we see that

$$\begin{aligned} {\widetilde{X}}_{k,1}=\frac{1}{N^{2+(k-1)\beta }}\sum _{i,j} a^{(k)}_{i,j}{\mathbb {E}} G^{(k+1)/2}_{ii}G_{jj}^{(k+1)/2}, \end{aligned}$$

where \(a^{(k)}_{ij}\) is deterministic and uniformly bounded. Combining with (5.12)–(5.14), we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}} (1+z{\underline{G}} \,)+{\mathbb {E}} {\underline{G}} \,^2+O_{\prec }\Big ({\mathbb {E}} \Gamma +\frac{1}{N}\Big )&=\sum _{s=2}^{\lceil {\ell /2} \rceil }\frac{1}{N^{2+(2s-2)\beta }}\sum _{i,j} a^{(2s-1)}_{ij}{\mathbb {E}} G^{s}_{ii}G_{jj}^{s} \\&=:\sum _{s=2}^{\lceil {\ell /2} \rceil } {\mathbb {E}} \,{\mathcal {S}}\big (T^{(s)}\big ) , \end{aligned} \end{aligned}$$
(5.15)

where

$$\begin{aligned} T^{(s)}=\frac{1}{N^{2+(2s-2)\beta }} a_{ij}^{(2s-1)} G_{ii}^sG_{jj}^s. \end{aligned}$$
(5.16)

To handle the right-hand side of (5.15) we invoke Lemma 5.6. Naively, we have

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T^{(s)}) \prec N^{(2-2s)\beta } \end{aligned}$$
(5.17)

for each n. By Lemma 5.6, we can write

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T^{(s)})= {\mathbb {E}} {\mathcal {M}}\big (\lceil {\beta ^{-1}} \rceil -2s+2,T^{(s)}\big )+O_{\prec }\big ({\mathbb {E}} \Gamma +N^{-1}\big ). \end{aligned}$$
(5.18)

Thus (5.15) becomes

$$\begin{aligned} {\mathbb {E}} (1+z{\underline{G}} \,)+{\mathbb {E}} {\underline{G}} \,^2-\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} {\mathcal {M}}\big (\lceil {\beta ^{-1}} \rceil -2s+2,T^{(s)}\big )=O_{\prec }\Big ({\mathbb {E}} \Gamma +N^{-1}\Big ). \end{aligned}$$

Thus we can set

$$\begin{aligned} Q_0({\underline{G}} \,)= {\underline{G^2}} \,-\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} {\mathcal {M}}\big (\lceil {\beta ^{-1}} \rceil -2s+2,T^{(s)}\big ), \end{aligned}$$

and note that \(Q_0\) is a polynomial of degree \(2\lceil {\beta ^{-1}} \rceil \). This concludes the proof of Lemma 2.9.

Remark 5.7

After the construction of \(P_0\) (and consequently P), we shall construct a more general class of abstract polynomials associated with P in Sect. 7.2 below.

6 Proof of Proposition 4.1

Convention

Throughout this section,

$$\begin{aligned} z=L_0+{\mathcal {Z}}+w, \end{aligned}$$
(6.1)

where \(w=\kappa +\mathrm {i}\eta \) deterministic.

The proof of Proposition 4.1 consists of two steps; in the first we first estimate \({\underline{G}} \,\) and in the second we apply this estimate to obtain a more precise bound of \({{\,\mathrm{Im}\,}}{\underline{G}} \,\).

6.1 Estimate of \({\underline{G}} \,\)

Define the spectral domain

$$\begin{aligned} {{\mathbf {Y}}}\equiv {{\mathbf {Y}}}(\delta ) =\bigg \{ w=\kappa +\mathrm {i}\eta \in {\mathbb {C}}_+:\frac{N^{-\delta }}{\sqrt{N}q}\leqslant \kappa \leqslant 1, N^{-\delta } N^{-5/8}q^{-1/4}\leqslant \eta \leqslant 1\bigg \}.\nonumber \\ \end{aligned}$$
(6.2)

As a guide to the reader, the lower bound on \(\kappa \) is chosen to be slightly smaller than the scale \(\frac{1}{\sqrt{N}q}\) on which the extreme eigenvalues fluctuate; analogously, the lower bound on \(\eta \) is chose to be slightly smaller than the scale \(N^{-5/8} q^{-1/4}\), which is the solution of the equation \(\frac{\eta }{\sqrt{\kappa }} = \frac{1}{N \eta }\) with \(\kappa = \frac{1}{\sqrt{N}q}\). Using that \({{\,\mathrm{Im}\,}}m(z) \asymp \frac{\eta }{\sqrt{\kappa }}\) (see Lemma 2.11), this choice of lower bound on \(\eta \) will allow us to rule out the presence of eigenvalues [see (6.6) below], and hence establish rigidity.

Recall the definition of \(\tau \) in Lemma 2.11, and note that the lower bound on \(\kappa \) ensures, with very high probability,

$$\begin{aligned} \tau (z) =|(L_0+{\mathcal {Z}}+\kappa )^2-L^2|\asymp \kappa \end{aligned}$$
(6.3)

for all \(w \in {{\mathbf {Y}}}\). The main technical step is the following bound for \(P(z,{\underline{G}} \,)\), whose proof is postponed to Sect. 7.

Proposition 6.1

Let \(w \in {{\mathbf {Y}}}\). Suppose \(|{\underline{G}} \,-m| \prec \Psi \) for some deterministic \(\Psi \in [\sqrt{\kappa }N^{-\delta },1]\). Then

$$\begin{aligned} P(z, {\underline{G}} \,) \prec \big (\kappa +\Psi ^2\big ) N^{-\delta }. \end{aligned}$$

Lemma 6.2

Suppose \(\varepsilon :{{\mathbf {Y}}}\rightarrow [N^{-1},N^{-\delta }]\) is a function so that

$$ P(z, {\underline{G}} \,) \prec \varepsilon (w) $$

for all \(w \in {{\mathbf {Y}}}\). Suppose \(\varepsilon (w)\) is Lipschitz continuous with Lipschitz constant N and moreover that for each fixed \(\kappa \) the function \(\eta \rightarrow \varepsilon (\kappa +\mathrm {i}\eta )\) is nonincreasing for \(\eta >0\). Then

$$\begin{aligned} |{\underline{G}} \,-m| \prec \frac{\varepsilon }{\sqrt{|\kappa | +\eta +\varepsilon }}. \end{aligned}$$

Proof

See [19, Proposition 2.11]. \(\square \)

Combining Proposition 6.1 and Lemma 6.2, we find that for any deterministic \(\Psi \) that does not depend on \(\eta \) we obtain the implication

$$\begin{aligned} |{\underline{G}} \,-m| \prec \Psi \implies |{\underline{G}} \,-m| \prec \sqrt{\kappa }N^{-\delta }+\Psi N^{-\delta /2}. \end{aligned}$$

Using the initial estimate \(|{\underline{G}} \,-m| \prec 1\) from Proposition 2.5, we therefore conclude the key bound

$$\begin{aligned} |{\underline{G}} \,-m| \prec \sqrt{\kappa } N^{-\delta }. \end{aligned}$$
(6.4)

6.2 Estimate of \({{\,\mathrm{Im}\,}}{\underline{G}} \,\)

Define the subset

$$\begin{aligned} {{\mathbf {Y}}}_{*}\equiv {{\mathbf {Y}}}_*(\delta ) =\bigg \{ w=\kappa +\mathrm {i}\eta \in {\mathbb {C}}_+:\frac{N^{-\delta }}{\sqrt{N}q}\leqslant \kappa \leqslant 1, \eta =N^{-\delta } N^{-5/8}q^{-1/4}\bigg \} \subset {{\mathbf {Y}}}.\nonumber \\ \end{aligned}$$
(6.5)

In this section we show that

$$\begin{aligned} {{\,\mathrm{Im}\,}}{\underline{G}} \, \prec \frac{1}{N^{1+\delta }\eta } \end{aligned}$$
(6.6)

for all \(w \in {{\mathbf {Y}}}_*\). This immediately implies that whenever \(\kappa +\mathrm {i}\eta \in {{\mathbf {Y}}}_*\), with very high probability there is no eigenvalue in the interval \((L_0+{\mathcal {Z}}+\kappa -\eta ,L_0+{\mathcal {Z}}+\kappa +\eta )\). In addition, [9, Lemma 4.4] implies

$$\begin{aligned} \Vert H\Vert -2 \prec \frac{1}{q}, \end{aligned}$$

and hence the largest eigenvalue \(\mu _N\) of H satisfies (4.1), and Proposition 4.1 is proved.

What remains, therefore, is the proof of (6.6). In analogy to Proposition 6.1, we have the following estimate for \( {{\,\mathrm{Im}\,}}P(z, {\underline{G}} \,)\), whose proof is postponed to Sect. 8.

Proposition 6.3

Let \(w \in {{\mathbf {Y}}}_*\). Suppose \(| {{\,\mathrm{Im}\,}}{\underline{G}} \,- {{\,\mathrm{Im}\,}}{m}| \prec \Phi \) for some deterministic \(\Phi \equiv \Phi \in [N^{-1-\delta }\eta ^{-1},1]\). Then

$$\begin{aligned} {{\,\mathrm{Im}\,}}P(z, {\underline{G}} \,) \prec \Big (\frac{1}{N\eta } +\Phi \Big ) \sqrt{\kappa }N^{-\delta }. \end{aligned}$$

Lemma 6.4

Let \(w \in {{\mathbf {Y}}}_*\). Suppose that

$${{\,\mathrm{Im}\,}}P(z, {\underline{G}} \,) \prec \varepsilon $$

for some deterministic \(\varepsilon \in [N^{-1},N^{-\delta }]\). Then

$$\begin{aligned} |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m| \prec \frac{\varepsilon }{\sqrt{\kappa }}+\frac{1}{N^{1+\delta }\eta }. \end{aligned}$$

Proof

A Taylor expansion gives

$$\begin{aligned} P(z,{\underline{G}} \,)=\partial _2 P(z,m)({\underline{G}} \,-m)+\sum _{k=2}^{\lceil {2\beta ^{-1}} \rceil }\frac{1}{k!}\partial _2^k P(z,m)({\underline{G}} \,-m)^k. \end{aligned}$$
(6.7)

Note that \(\partial ^k_2 P(z,m) \prec 1\), and, recalling the definition of P, we find from (2.5) and by Lemma 2.11 that

$$\begin{aligned} {{\,\mathrm{Im}\,}}\partial ^k_2 P(z,m) \prec {{\,\mathrm{Im}\,}}z+{{\,\mathrm{Im}\,}}m \prec \frac{\eta }{\sqrt{\kappa +\eta }} \leqslant \frac{1}{N^{1+\delta }\eta } \end{aligned}$$
(6.8)

for all \(k \geqslant 1\), where the last inequality holds for any \(w \in Y_*\) we have

$$\begin{aligned} \frac{\eta }{\sqrt{\kappa +\eta }} \leqslant \frac{1}{N^{1+\delta }\eta }. \end{aligned}$$
(6.9)

This implies, for all \(k \geqslant 2\),

$$\begin{aligned} {{\,\mathrm{Im}\,}}\Big ( \partial _2^k P(z,m)({\underline{G}} \,-m)^k \Big )&\prec \frac{1}{N^{1 +\delta }\eta }|{\underline{G}} \,-m|^k+ |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m||{\underline{G}} \,-m|^{k-1} \nonumber \\&\prec \Big ( \frac{1}{N^{1 +\delta }\eta } +|{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m|\Big )\sqrt{\kappa }N^{-\delta }, \end{aligned}$$
(6.10)

where in the second step we used (6.4). Taking imaginary part of (6.7) and rearranging the terms, we have

$$\begin{aligned} {{\,\mathrm{Re}\,}}\partial _2 P(z,m) {{\,\mathrm{Im}\,}}({\underline{G}} \,-m)&={{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,)-{{\,\mathrm{Im}\,}}\partial _2 P(z,m) {{\,\mathrm{Re}\,}}({\underline{G}} \,-m)\\&\quad +O_{\prec }(\sqrt{\kappa }N^{-\delta })\Big ( \frac{1}{N^{1 +\delta }\eta } +|{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m|\Big ). \end{aligned}$$

Note that \(|{{\,\mathrm{Im}\,}}\partial _2 P(z,m)| \asymp |{{\,\mathrm{Im}\,}}m| \ll \sqrt{\kappa }\), and by (2.7) we have \(|\partial _2 P(z,m)|\asymp \sqrt{\kappa }\). Thus

$$\begin{aligned} |{{\,\mathrm{Re}\,}}\partial _2 P(z,m)|\asymp \sqrt{\kappa } , \end{aligned}$$

and together with (6.4) and (6.8) we have

$$\begin{aligned} |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m|&\prec \frac{|{{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,) |}{\sqrt{\kappa }}+\frac{1}{\sqrt{\kappa }} \frac{1}{N^{1+\delta }\eta }\sqrt{\kappa }N^{-\delta }\\&\quad +\frac{1}{\sqrt{\kappa }}\sqrt{\kappa }N^{-\delta }\Big ( \frac{1}{N^{1 +\delta }\eta } +|{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m|\Big ) \\&\prec \frac{\varepsilon }{\sqrt{\kappa }}+\frac{1}{N^{1+\delta }\eta }+N^{-\delta } |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m|. \end{aligned}$$

This yields the claim. \(\square \)

From Proposition 6.3 and Lemma 6.4 we obtain the implication

$$\begin{aligned} |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m| \prec \Phi \implies |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m| \prec \frac{1}{N\eta }N^{-\delta }+\Phi N^{-\delta }. \end{aligned}$$
(6.11)

Iterating (6.11) \(O(1/\delta )\) times and recalling Definition 2.3 yields

$$\begin{aligned} |{{\,\mathrm{Im}\,}}{\underline{G}} \,-{{\,\mathrm{Im}\,}}m| \prec \frac{1}{N^{1+\delta }\eta } \end{aligned}$$
(6.12)

for all \(w \in {{\mathbf {Y}}}_*\). Since

$$\begin{aligned} {{\,\mathrm{Im}\,}}m \asymp \frac{\eta }{\sqrt{\eta +\kappa }} \leqslant \frac{1}{N^{1+\delta }\eta }, \end{aligned}$$

we thus conclude (6.6). This concludes the proof of Proposition 4.1.

7 Proof of Proposition 6.1

Convention

Throughout this section, z is given by (6.1), where \(w \in {{\mathbf {Y}}}\) is deterministic.

Fix \(n \in {\mathbb {N}}_+\) and set

$$\begin{aligned} {\mathcal {P}} :=\Vert P(z,{\underline{G}} \,)\Vert _{2n}=\Big ({\mathbb {E}} |P(z,{\underline{G}} \,)|^{2n}\Big )^{\frac{1}{2n}}, \quad {\mathcal {E}} :=\big (\kappa +\Psi ^2\big ) N^{-\delta }. \end{aligned}$$

We shall show, for any fixed \(n \in {\mathbb {N}}\), that

$$\begin{aligned} {\mathbb {E}} |P(z,{\underline{G}} \,)|^{2n}={\mathcal {P}}^{2n} \prec {\mathcal {E}}^{2n}, \end{aligned}$$
(7.1)

from which Proposition 6.1 follows by Chebyshev’s inequality. The rest of this section is therefore devoted to the proof of (7.1).

Set

$$\begin{aligned} Q_0({\underline{G}} \,):=P(z,{\underline{G}} \,)-\big (1+z{\underline{G}} \,+{\mathcal {Z}} {\underline{G}} \,^2\big ), \quad Q({\underline{G}} \,):=P(z,{\underline{G}} \,)-(1+z{\underline{G}} \,), \end{aligned}$$

and abbreviate

$$\begin{aligned} P\equiv P(z,{\underline{G}} \,),\quad P':=\partial _2 P(z,{\underline{G}} \,), \quad Q_0=Q_0({\underline{G}} \,) \quad \text {and} \quad Q=Q({\underline{G}} \,). \end{aligned}$$
(7.2)

Note that argument z of G is random, and

$$\begin{aligned} \frac{\partial G_{kl}}{\partial H_{ij}}=-(G_{ki}G_{jl}+G_{li}G_{jk})(1+\delta _{ij})^{-1}+4N^{-1}(G^2)_{kl}H_{ij}(1+\delta _{ij})^{-1}, \end{aligned}$$
(7.3)

and as a result

$$\begin{aligned} \frac{\partial P}{\partial H_{ij}}= \big (-2P'N^{-1}(G^2)_{ij}+4P'N^{-1}H_{ij}{\underline{G^2}} \,+4N^{-1}H_{ij}{\underline{G}} \,^2\big )(1+\delta _{ij})^{-1}. \end{aligned}$$
(7.4)

We define the parameter

$$\begin{aligned} \Upsilon :=\frac{\Psi +\sqrt{\kappa +\eta }}{N\eta }. \end{aligned}$$
(7.5)

Recalling the random variable \(\Gamma \) from (5.3), we find

$$\begin{aligned} \Gamma \prec \Upsilon . \end{aligned}$$
(7.6)

Moreover, we have

$$\begin{aligned} (\Psi +\sqrt{\kappa +\eta })\Upsilon \geqslant \sqrt{\kappa +\eta }\cdot \frac{\sqrt{\kappa +\eta }}{N\eta } \geqslant N^{-1}. \end{aligned}$$
(7.7)

The next lemma collects basic estimates for the derivatives of P.

Lemma 7.1

Under the assumptions of Proposition 6.1, for any fixed \(k \in {\mathbb {N}}_+\) we have

$$\begin{aligned} P' \prec \Psi +\sqrt{|\kappa |+\eta }, \quad \quad \bigg |\frac{\partial ^k {\underline{G}} \,}{\partial H_{ij}^k}\bigg | \prec \max _{x,y} N^{-1}|(G^2)_{xy} |\prec \Upsilon \end{aligned}$$
(7.8)

and

$$\begin{aligned} \frac{\partial ^k P}{\partial H_{ij}^k} \prec \Big (\Psi +\sqrt{|\kappa |+\eta }\,\Big )\Upsilon . \end{aligned}$$
(7.9)

Proof

By the mean value theorem,

$$\begin{aligned} P'=\partial _2 P(z,{\underline{G}} \,)=\partial _2P(z,m)+\partial _2^2 P(z,\xi )(m-{\underline{G}} \,) \end{aligned}$$

for some \(\xi \) between m and \({\underline{G}} \,\). Then the first estimate in (7.8) is proved using Lemma 2.11 and (6.3). The second estimate in (7.8) is proved by Lemmas 2.6 and 2.8. By (7.4) and (7.8), one easily checks that

$$\begin{aligned} \frac{\partial ^k P}{\partial H_{ij}^k}\prec & {} \sup _{x,y}P' N^{-1}|(G^2)_{xy}|+\sup _{x,y}\big (N^{-1}|(G^2)_{xy}|\big )^2+N^{-1}\\\prec & {} (\Psi +\sqrt{\kappa +\eta })\Upsilon +N^{-1}, \end{aligned}$$

and combing with (7.7) one concludes (7.9). \(\square \)

7.1 The first expansion

By \((H-z)G=I\), we have

$$\begin{aligned} {\mathbb {E}} |P|^{2n}= & {} {\mathbb {E}} \Big ({\underline{HG}} \, +{\mathcal {Z}} {\underline{G}} \,^2+Q_0 \Big ) P^{n-1}P^{*n} \\= & {} {\mathbb {E}} \Big ({\mathcal {Z}} {\underline{G}} \,^2+Q_0 \Big ) P^{n-1}P^{*n}+\frac{1}{N} \sum _{i,j} {\mathbb {E}} H_{ij}G_{ji}P^{n-1}P^{*n}. \end{aligned}$$

We use Lemma 2.1 to calculate the last term. By setting \(h=H_{ij}\), \(f=f_{ji}(H)=G_{ji}P^{n-1}P^{*n}\), we get

$$\begin{aligned} \frac{1}{N} \sum _{i,j} {\mathbb {E}} H_{ij}G_{ji}P^{n-1}P^{*n}= & {} \frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k (G_{ij}P^{n-1}P^{*n})}{\partial H_{ij}^k} \bigg ] \nonumber \\&+O_{\prec }(N^{-4n}) \end{aligned}$$
(7.10)

where, as in (5.12), we choose a large enough \(\ell \in {\mathbb {N}}_+\) such that the remainder term is is negligible. By splitting the differentials in (7.10) basing on if \(P,{{\overline{P}} \,}\) are differentiated, we have

$$\begin{aligned} {\mathbb {E}} |P|^{2n}&={\mathbb {E}} Q_0 P^{n-1}P^{*n}+{\mathbb {E}}{\mathcal {Z}} {\underline{G}} \,^2P^{n-1}P^{*n} \nonumber \\&\quad +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k} P^{n-1}P^{*n}\bigg ] \nonumber \\&\quad +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s( P^{n-1}P^{*n})}{\partial H_{ij}^s} \frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}\bigg ] \nonumber \\&\quad +O_{\prec }(N^{-4n}) \nonumber \\&=:\text {(I)}+\text {(II)}+\text {(III)}+\text {(IV)}+O_{\prec }(N^{-4n}). \end{aligned}$$
(7.11)

We have the following result, which handles the terms on right-hand side of (7.11) and directly implies (7.1).

Lemma 7.2

Let (I–IV) be as in (7.11). We have

$$\begin{aligned} \mathrm {(II)}+\mathrm {(IV)}\prec \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r} \end{aligned}$$
(7.12)

as well as

$$\begin{aligned} \mathrm {(I)}+\mathrm {(III)}\prec \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$
(7.13)

The rest of Sect. 7 is devoted to showing Lemma 7.2. To simplify notation, we drop the complex conjugates in (I–IV) (which play no role in the subsequent analysis), and estimate the quantities

$$\begin{aligned} \begin{aligned} \text {(II')}+\text {(IV')}&:={\mathbb {E}}{\mathcal {Z}} {\underline{G}} \,^2P^{2n-1} +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \\&\quad \quad \left[ \frac{\partial ^s P^{2n-1}}{\partial H_{ij}^s} \frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}\right] \end{aligned} \end{aligned}$$
(7.14)

and

$$\begin{aligned} \text {(I')}+\text {(III')}:={\mathbb {E}} Q_0 P^{2n-1}+\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k} P^{2n-1}\bigg ]. \end{aligned}$$
(7.15)

7.2 Abstract polynomials, part II

In order to estimate (7.14) and (7.15), we introduce the following class of abstract polynomials, which generalizes the class \({\mathcal {T}}\) from Definition 5.1.

Definition 7.3

Let \(\{i_1,i_2,\ldots \}\) be an infinite set of formal indices. To integers \(s,k, \nu _1,\nu _3 \in {\mathbb {N}}\), digits \(\nu _4,\nu _5 \in \{0,1\}\) satisfying \(\nu _4 \leqslant \nu _5\), a real number \(\theta \in {\mathbb {R}}\), formal indices \(x,y,x_1, y_1, \ldots , x_\sigma , y_\sigma \in \{i_1, \ldots , i_{\nu _1}\}\), and a family \((a_{i_1,\ldots ,i_{\nu _1}})_{1\leqslant i_1,\ldots ,i_{\nu _1}\leqslant N}\) of uniformly bounded complex numbers we assign a formal monomial

$$\begin{aligned} V = a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta } (P')^{\nu _4}(N^{-1}(G^2)_{xy})^{\nu _5} G_{x_1y_1}G_{x_2y_2}\ldots G_{x_{k}y_{k}}{\underline{G}} \,^s P^{\nu _3}, \end{aligned}$$
(7.16)

We denote \(\sigma (V) = s+k\), \(\nu _i(V) = \nu _i\) for \(i = 1,3,4,5\), \(\theta (V) = \theta \), and

$$\begin{aligned} \nu _2(V) :={{\mathbf {1}}}_{x \ne y} + \sum _{l = 1}^k {{\mathbf {1}}}_{x_l \ne y_l}. \end{aligned}$$

We denote by \({\mathcal {V}}\) the set of formal monomials V of the form (7.16).

We extend the evaluation from Definition 5.2 to the set \({\mathcal {V}}\), and denote the evaluation of V as in (7.16) by \(V_{i_1, \ldots , i_{\nu _1}}\). We also extend the operation \({\mathcal {S}}\) from (5.2) to \({\mathcal {V}}\).

The next lemma is an analogue of Lemma 5.3, whose proof is postponed to Sect. 10.3.

Lemma 7.4

Let \(V \in {\mathcal {V}}\) and abbreviate \(\nu _i = \nu _i(V)\) and \(\theta = \theta (V)\). Suppose that \(\nu _2 \ne 0\).

  1. (i)

    We have

    $$\begin{aligned} {\mathbb {E}} {\mathcal {S}} (V)&\prec N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta })^{\nu _4} (N\eta )^{-\nu _5}\Upsilon {\mathbb {E}} |P^{\nu _3}| \\&\quad +\sum _{t=1}^{\nu _3} N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta })^{\nu _4} \Upsilon ^{\nu _5}((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{\nu _3-t}|. \end{aligned}$$
  2. (ii)

    Moreover, when \(\nu _4(V)=\nu _5(V)=0\), we have the stronger estimate

    $$\begin{aligned} {\mathbb {E}} {\mathcal {S}} (V)\prec & {} N^{\nu _1-\theta }\Upsilon {\mathbb {E}} |P^{\nu _3}|+ N^{\nu _1-\theta }\Upsilon ^{2}{\mathbb {E}} |P^{\nu _3-1}| \\&+\sum _{t=2}^{\nu _3} N^{\nu _1-\theta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{\nu _3-t}|. \end{aligned}$$

In the sequel, we also need the subset

$$\begin{aligned} {\mathcal {V}}_0 :=\{V\in {\mathcal {V}} :\nu _2(V)=0, \nu _4(V)=\nu _5(V)\}. \end{aligned}$$

In analogy to (5.5), we define an averaging map \({\mathcal {M}}\) from \({\mathcal {V}}_0\) to the space of random variables through

$$\begin{aligned} {\mathcal {M}}(V)=\sum _{i_1,\ldots ,i_{\nu _1}}a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta } (P'N^{-1}{\underline{G^2}} \,)^{\nu _4}\,{\underline{G}} \,^{s+k}P^{\nu _3} \end{aligned}$$

for

$$\begin{aligned} V =a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }(P'N^{-1}(G^2)_{xx})^{\nu _4}G_{x_1x_1}G_{x_2x_2}\ldots G_{x_{k}x_{k}}{\underline{G}} \,^sP^{\nu _3}. \end{aligned}$$

The following is an analogue of Lemma 5.5, whose proof is given in Sect. 10.4.

Lemma 7.5

Let \(V \in {\mathcal {V}}_0\). There exist \(V^{(1)},\ldots ,V^{(k)} \in {\mathcal {V}}_0\) such that, abbreviating \(\nu _i = \nu _i(V)\) and \(\theta = \theta (V)\),

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(V)&={\mathbb {E}} {\mathcal {M}}(V)+\sum _{l=1}^k{\mathbb {E}}\, {\mathcal {S}}\big (V^{(l)}\big )+O_{\prec }\big ( N^{\nu _1-\theta }\Upsilon ^{1+\nu _4}{\mathbb {E}} |P^{\nu _3}|\,\big )\\&\quad +\sum _{t=1}^{\nu _3} O_{\prec }\big (N^{\nu _1-\theta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{\nu _4+t}{\mathbb {E}} |P^{\nu _3-t}|\,\big ), \end{aligned}$$

where k is fixed, and each \(V^{(l)}\) satisfies \( V^{(l)} \in {\mathcal {V}}_0\), \(\sigma (V^{(l)})- \sigma (V) \in 2{\mathbb {N}}+4\), \(\nu _i(V^{(l)})=\nu _i(V)\) for \(i=2,3,4,5\),

$$\begin{aligned} \nu _1\big (V^{(l)}\big )=\nu _1(V)+1, \quad \text {and} \quad \theta \big (V^{(l)}\big )=\theta (V)+1+\beta \big (\sigma \big (V^{(l)}\big )-\sigma (V)-2\big ). \end{aligned}$$

As a result, each \(V^{(l)}\) satisfies

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}} \big (V^{(l)}\big ) \prec N^{\nu _1(V)-\theta (V)-2\beta }\big (\Psi +\sqrt{\kappa +\eta }\big ) \Upsilon {\mathbb {E}} |P^{\nu _3}|. \end{aligned}$$

Repeatedly using Lemma 7.5, and together with (7.6), we obtain the following result.

Lemma 7.6

Let \(V \in {\mathcal {V}}_0\) and abbreviate \(\nu _i = \nu _i(V)\), \(\theta = \theta (V)\), and \(\sigma =\sigma (V)\). Then there exist deterministic uniformly bounded \(b_1,\ldots ,b_{\lceil {\beta ^{-1}} \rceil }\) such that

$$\begin{aligned} {\mathcal {M}}_{\infty }(V):=M(V)+N^{\nu _1-\theta }\sum _{l=2}^{\lceil {\beta ^{-1}} \rceil } b_{l} N^{-l\beta } (P'N^{-1}{\underline{G^2}} \,)^{\nu _4}\,{\underline{G}} \,^{\sigma +2l} P^{\nu _3} \end{aligned}$$

satisfies

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V)= & {} {\mathbb {E}} {\mathcal {M}}_{\infty }(V) +O_{\prec }\big ( N^{\nu _1-\theta }\Upsilon ^{1+\nu _4}{\mathbb {E}} |P^{\nu _3}|\,\big ) \\&+\sum _{t=1}^{\nu _3} O_{\prec }\big (N^{\nu _1-\theta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{\nu _4+t}{\mathbb {E}} |P^{\nu _3-t}|\,\big ). \end{aligned}$$

Finally, we have the following extension of Lemma 5.6, which is proved in Sect. 10.5.

Lemma 7.7

Fix \(r,u,v\in {\mathbb {N}}\). Let \(T \in {\mathcal {T}}_0\) and let \({\mathcal {M}}(r,T)\) be as in Lemma 5.6. Then

$$\begin{aligned} {\mathbb {E}} [\partial _{w}({\mathcal {S}}(T)){\underline{G}} \,^u P^v]&={\mathbb {E}} [\partial _{w}({\mathcal {M}}(r, T)) {\underline{G}} \,^u P^v]\\&\quad +O_{\prec }\big (N^{\nu _1(T)-\theta (T)+1}\Upsilon ((N\eta )^{-1}+N^{-\beta (r+1)}){\mathbb {E}} |P|^v\big )\\&\quad +\sum _{t=1}^v O_{\prec }\big (N^{\nu _1(T)-\theta (T)+1}\Upsilon ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^t{\mathbb {E}} |P|^{v-t}\big ). \end{aligned}$$

7.3 The computation of (IV’) in (7.14)

We write \(\text {(IV')}=\sum _{k=1}^lX_k\), where

$$\begin{aligned} X_k:=\frac{1}{N}\frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s P^{2n-1}}{\partial H_{ij}^s} \frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}\bigg ]. \end{aligned}$$
(7.17)

7.3.1 The estimate of \(X_1\)

By (7.4) and \({\mathcal {C}}_2(H_{ij})=N^{-1}(1+O(\delta _{ij}))\), we have

$$\begin{aligned} X_1&=\frac{1}{N}\sum _{i,j}{\mathcal {C}}_2(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial P^{2n-1}}{\partial H_{ij}} G_{ij}\bigg ] =\frac{2n-1}{N}\sum _{i,j}{\mathcal {C}}_2(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial P}{\partial H_{ij}} P^{2n-2}G_{ij} \bigg ] \nonumber \\&=\frac{2n-1}{N^2}\sum _{i,j} {\mathbb {E}} (-2P'N^{-1}(G^2)_{ij} +4P'N^{-1}H_{ij}{\underline{G^2}} \,+4N^{-1}H_{ij}{\underline{G}} \,^2)P^{2n-2}G_{ij} \nonumber \\&\quad +\frac{2n-1}{N^2}\sum _{i} (N {\mathcal {C}}_2(H_{ii})-2) \nonumber \\&\quad {\mathbb {E}} \big (-2P'N^{-1}(G^2)_{ii}+4P'N^{-1}H_{ii}{\underline{G^2}} \,+4N^{-1}H_{ii}{\underline{G}} \,^2\big )P^{2n-2}G_{ij}. \end{aligned}$$
(7.18)

Estimating the last term using Lemma 7.1, we conclude

$$\begin{aligned} X_1= & {} \frac{2n-1}{N^2}{\mathbb {E}} \big (-2P'{\underline{G^3}} \,+4P'{\underline{HG}} \,\,{\underline{G^2}} \,+4{\underline{HG}} \,\,{\underline{G}} \,^2\big )P^{2n-2} \nonumber \\&+O_{\prec }\big (N^{-1}(\Psi +\sqrt{\kappa +\eta }) \Upsilon \big ){\mathbb {E}} |P^{2n-2}|. \end{aligned}$$
(7.19)

By \(HG=zG+I\) and \(z \prec 1\), we deduce that \({\underline{HG}} \, \prec 1\). In addition, it is easy to check that \({\underline{G^3}} \, \prec \Upsilon (N\eta )^{-1}\). Thus the first term on right-hand side of (7.19) can be estimated by

$$\begin{aligned}&O_{\prec }\Big ((\Psi +\sqrt{\kappa +\eta })\Upsilon N^{-1}\eta ^{-1}+(\Psi +\sqrt{\kappa +\eta })\Upsilon N^{-1}+N^{-2}\Big ) {\mathbb {E}} |P^{2n-2}| \\&\quad \prec \Upsilon ^2 {\mathbb {E}} |P^{2n-2}|. \end{aligned}$$

As a result,

$$\begin{aligned} X_1 \prec \Upsilon ^2 {\mathbb {E}} |P^{2n-2}| \leqslant \Upsilon ^2 {\mathcal {P}}^{2n-2}. \end{aligned}$$

where in the second step we used Hölder’s inequality. Since

$$\begin{aligned} \Upsilon = \frac{\Psi +\sqrt{\kappa +\eta }}{N\eta } \leqslant \frac{\Psi +\sqrt{\kappa }}{N\eta }+\frac{1}{N\sqrt{\eta }} \leqslant (\Psi ^2+\kappa ) N^{-\delta } +\frac{1}{N^2\eta ^2}N^{\delta } +\frac{1}{N\sqrt{\eta }}\prec {\mathcal {E}} \end{aligned}$$

for all \(w \in {{\mathbf {Y}}}\), we have \(X_1 \prec {\mathcal {E}}^2 {\mathcal {P}}^{2n-2}\) as desired.

7.3.2 The estimate of \(X_2\)

Let us split

$$\begin{aligned} X_2= & {} X_{2,1}+X_{2,2}:=\frac{1}{2N}\sum _{i,j} {\mathcal {C}}_{3}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^2 P^{2n-1}}{\partial H_{ij}^2} G_{ij}\bigg ]\\&+\frac{1}{N}\sum _{i,j} {\mathcal {C}}_{3}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial P^{2n-1}}{\partial H_{ij}} \frac{\partial G_{ij}}{\partial H_{ij}}\bigg ]. \end{aligned}$$

Since \({\mathcal {C}}_3(H_{ij})=O(N^{-1-\beta })\), we have

$$\begin{aligned} X_{2,1}&\prec \frac{1}{N^{2+\beta }}\sum _{i,j}{\mathbb {E}}\Big |\frac{\partial ^2 P}{\partial H_{ij}^2} P^{2n-2}G_{ij}\Big |+\frac{1}{N^{2+\beta }}\sum _{i,j}{\mathbb {E}}\Big |\Big (\frac{\partial P}{\partial H_{ij}}\Big )^2 P^{2n-3}G_{ij}\Big | \nonumber \\&\prec \frac{(\Psi +\sqrt{\kappa +\eta })\Upsilon }{N^{2+\beta }}\sum _{i,j}{\mathbb {E}}|P^{2n-2}G_{ij}| + \frac{((\Psi +\sqrt{\kappa +\eta })\Upsilon )^2}{N^{2+\beta }}\sum _{i,j}{\mathbb {E}}|P^{2n-3}G_{ij}| \nonumber \\&\prec \frac{((\Psi +\sqrt{\kappa +\eta })\Upsilon )\sqrt{\Upsilon }}{N^{\beta }}{\mathbb {E}}|P^{2n-2}| + \frac{((\Psi +\sqrt{\kappa +\eta })\Upsilon )^2\sqrt{\Upsilon }}{N^{\beta }}{\mathbb {E}}|P^{2n-3}|, \end{aligned}$$
(7.20)

where in the second and third step we used Lemmas 2.8 and 7.1 respectively. Note that

$$\begin{aligned} \frac{(\Psi +\sqrt{\kappa +\eta })\Upsilon \sqrt{\Upsilon }}{N^{\beta }}= & {} \frac{(\Psi +\sqrt{\kappa +\eta })^{5/2}}{(N\eta )^{3/2}N^{\beta }} \nonumber \\\prec & {} \frac{(\Psi +\sqrt{\kappa })^4}{N^{\beta }}+\frac{1}{(N\eta )^4N^{\beta }}+\frac{1}{N^{3/2}\eta ^{1/4}N^{\beta }} \nonumber \\\prec & {} {\mathcal {E}}^2, \end{aligned}$$
(7.21)

and similarly

$$\begin{aligned} \frac{((\Psi +\sqrt{\kappa +\eta })\Upsilon )^2\sqrt{\Upsilon }}{N^{\beta }} \prec {\mathcal {E}}^3. \end{aligned}$$

Thus by (7.20) we get

$$\begin{aligned} X_{2,1}\prec {\mathcal {E}}^2 {\mathbb {E}} |P^{2n-2}|+{\mathcal {E}}^3 {\mathbb {E}} |P^{2n-3}| \prec {\mathcal {E}}^2 {\mathcal {P}}^{2n-2}+{\mathcal {E}}^3 {\mathcal {P}}^{2n-3} \end{aligned}$$
(7.22)

as desired. As for the term \(X_{2,2}\), we see from (7.3) that the most dangerous term is

$$\begin{aligned}&\frac{1}{N}\sum _{i,j} {\mathcal {C}}_3(H_{ij}) {\mathbb {E}} \Big [(2n-1)P^{2n-2} \frac{\partial P}{\partial H_{ij}}(-G_{ii}G_{jj})(1+\delta _{ij})^{-1}\Big ] \nonumber \\&\quad =\frac{1}{N^{2+\beta }}\sum _{i,j} a_{ij} {\mathbb {E}} \Big [P^{2n-2}(-2P'N^{-1}(G^2)_{ij}+4P'N^{-1}H_{ij}{\underline{G^2}} \,+4N^{-1}H_{ij}{\underline{G}} \,^2G_{ii}G_{jj}\Big ] \nonumber \\&\quad =:X_{2,2,1}+X_{2,2,2}+X_{2,2,3}, \end{aligned}$$
(7.23)

where \(a_{ij}\) is deterministic and uniformly bounded. Note that we write

$$ X_{2,2,1}={\mathbb {E}}{\mathcal {S}}(V), $$

where \(\nu _1(V)=2,\theta (V)=2+\beta ,\nu _3(V)=2n-2,\nu _4(V)=\nu _5(V)=1\). By Lemma 7.4 (i), we have

$$\begin{aligned} X_{2,2,1}&\prec N^{-\beta }(\Psi +\sqrt{\kappa +\eta }) (N\eta )^{-1}\Upsilon {\mathbb {E}} |P^{2n-2}|\\&\quad +\sum _{t=1}^{2n-2} N^{-\beta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t+1}{\mathbb {E}} |P^{2n-2-t}|. \end{aligned}$$

One can easily check

$$\begin{aligned} N^{-\beta }(\Psi +\sqrt{\kappa +\eta }) (N\eta )^{-1}\Upsilon \prec {\mathcal {E}}^2 \quad \text {and}\quad N^{-\beta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t+1} \prec {\mathcal {E}}^{2+t} \end{aligned}$$
(7.24)

for all \(t\geqslant 1\). Thus we have

$$\begin{aligned} X_{2,2,1} \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathbb {E}} |P^{2n-r} |\prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$

For \(X_{2,2,2}\), we can again apply Lemma 2.1 for \(h=H_{ij}\) and get

$$\begin{aligned} X_{2,2,2}=\frac{4}{N^2q}\sum _{i,j} a_{ij}\sum _{k=1}^\ell {\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \bigg [\frac{\partial ^k P^{2n-2} P'N^{-1}{\underline{G^2}} \,}{\partial H_{ij}^k}\bigg ]+O_{\prec }(N^{-4n}). \end{aligned}$$

Note that

$$\begin{aligned} \frac{\partial ^s P' N^{-1}{\underline{G^2}} \,}{\partial H_{ij}^s} \prec (\Psi +\sqrt{\kappa +\eta })\Upsilon \end{aligned}$$

for all fixed \(s\geqslant 0\). Together with (7.9) and the trivial bound \(N^{-1} \prec {\mathcal {E}}\), we see that

$$\begin{aligned} X_{2,2,2} \prec \frac{1}{Nq} \sum _{t=0}^{2n-2} ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^{t+1}{\mathcal {P}}^{2n-2-t} +{\mathcal {E}}^{4n}\prec \sum _{r=2}^{2n}{\mathcal {E}}^r {\mathcal {P}}^{2n-r}, \end{aligned}$$

where in the last step we also used \(\Upsilon \prec {\mathcal {E}}\). Similar steps also work for \(X_{2,2,3}\). As a result, we have \((7.23) \prec \sum _{r=2}^{2n}{\mathcal {E}}^r {\mathcal {P}}^{2n-r}\). Other terms in \(X_{2,2}\) can be estimated in a similar fashion, which leads to

$$\begin{aligned} X_{2,2} \prec \sum _{r=2}^{2n}{\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$

Combining with (7.22) we get

$$\begin{aligned} X_2\prec \sum _{r=2}^{2n}{\mathcal {E}}^r {\mathcal {P}}^{2n-r} \end{aligned}$$

as desired.

7.3.3 The computation of \(X_3\)

Let us split

$$\begin{aligned} X_{3}=X_{3,1}+X_{3,2}+X_{3,3}, \end{aligned}$$

where

$$\begin{aligned} X_{3,s}:=\frac{1}{N}\frac{1}{3!} {3 \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{4}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s P^{2n-1}}{\partial H_{ij}^s} \frac{\partial ^{3-s} G_{ij}}{\partial H_{ij}^{3-s}}\bigg ] \end{aligned}$$

for \(s=1,2,3\).

Step 1 When \(s=1,3\), it is easy to see from (7.3) that

$$\begin{aligned} \frac{\partial ^{3-s} G_{ij}}{\partial H_{ij}^{3-s}}\prec |G_{ij}|+ \Upsilon . \end{aligned}$$

Using (7.9), we can deduce

$$\begin{aligned} \frac{\partial ^s P^{2n-1}}{\partial H_{ij}^s} \prec \sum _{t=0}^{2n-2} ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^{t+1} P^{2n-2-t}. \end{aligned}$$

Thus

$$\begin{aligned} X_{3,s}&\prec \frac{1}{N^{2+2\beta }} \sum _{i,j}\sum _{t=0}^{2n-2} ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^{t+1} {\mathbb {E}} |P^{2n-2-t}(|G_{ij}|+ \Upsilon )|\\&\prec \frac{1}{N^{2\beta }} \sum _{t=0}^{2n-2} ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^{t+1} \sqrt{\Upsilon } {\mathbb {E}} |P^{2n-2-t}| \\&\prec \frac{1}{N^{2\beta }} \sum _{t=0}^{2n-2} ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^{t+1} \sqrt{\Upsilon } {\mathcal {P}}^{2n-2-t}, \end{aligned}$$

where in the second step we used Lemma 2.8. As in (7.21) and (7.24), we have

$$\begin{aligned} \frac{1}{N^{2\beta }}((\Psi +\sqrt{\kappa +\eta })\Upsilon )^{t+1} \sqrt{\Upsilon } \prec {\mathcal {E}}^{2+t} \end{aligned}$$

for all \( t \geqslant 0\). Thus \(X_{3,s} \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\) for \(s=1,3\).

Step 2 Let us consider

$$\begin{aligned} X_{3,2}= \frac{1}{2N}\sum _{i,j} {\mathcal {C}}_{4}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^2 P^{2n-1}}{\partial H_{ij}^2} \frac{\partial G_{ij}}{\partial H_{ij}}\bigg ]. \end{aligned}$$

Similar as in the previous steps, we can show that

$$\begin{aligned} X_{3,2}&=\frac{1}{2N}\sum _{i,j} {\mathcal {C}}_{4}(H_{ij}) {\mathbb {E}} \nonumber \\&\quad \bigg [ \frac{\partial ^2 P^{2n-1}}{\partial H_{ij}^2} (-G_{ii}G_{jj}-(1-\delta _{ij})G_{ij}^2)+4N^{-1}(G^2)_{ij}H_{ij}(1+\delta _{ij})^{-1}\bigg ] \nonumber \\&=-\frac{1}{2N}\sum _{i,j} {\mathcal {C}}_{4}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^2 P^{2n-1}}{\partial H_{ij}^2} G_{ii}G_{jj}\bigg ]+O_{\prec }({\mathcal {E}}^2 ){\mathbb {E}} |P^{2n-2}|. \end{aligned}$$
(7.25)

By Lemma 7.1 and (7.4), we have

$$\begin{aligned} \frac{\partial ^2 P^{2n-1}}{\partial H_{ij}^2}=(2n-1)P^{2n-2}\frac{\partial ^2 P}{\partial H_{ij}^2}+O_{\prec } ((\Psi +\sqrt{\kappa +\eta })^2\Upsilon ^2) P^{2n-3} \end{aligned}$$

and

$$\begin{aligned} \frac{\partial ^2 P}{\partial H_{ij}^2}&=\frac{\partial \big (-2P'N^{-1}(G^2)_{ij}+4P'N^{-1}H_{ij}{\underline{G^2}} \,+4N^{-1}H_{ij}({\underline{G}} \,+{\underline{G}} \,^2)\big )}{\partial H_{ij}}(1+\delta _{ij})^{-1}\\&=2P'N^{-1}\big ((G^2)_{ii}G_{jj}+(G^2)_{jj}G_{ii}\big )(1+\delta _{ij})^{-2} \\&\quad +4P'N^{-1}{\underline{G^2}} \,(1+\delta _{ij})^{-1}+4N^{-1}({\underline{G}} \,+{\underline{G}} \,^2)(1+\delta _{ij})^{-1}\\&\quad +O_{\prec }((\Psi +\sqrt{\kappa +\eta })\Upsilon ) (|G_{ij}|+N^{-1}\eta ^{-1}). \end{aligned}$$

Together with Lemma 2.8 we get

$$\begin{aligned} X_{3,2}&=-\frac{4n-2}{N^2}\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}\big [P^{2n-2}P'(G^2)_{ii}G_{ii}G^2_{jj} \\&\quad +P^{2n-2}P'{\underline{G^2}} \,G_{ii}G_{jj}+P^{2n-2} ({\underline{G}} \,+{\underline{G}} \,^2)G_{ii}G_{jj}\big ]\\&\quad +O_{\prec }\big ((\Psi +\sqrt{\kappa +\eta })\Upsilon ^{3/2}\big ){\mathbb {E}} |P^{2n-2}|+O_{\prec }({\mathcal {E}}^2 ){\mathbb {E}} |P^{2n-2}|. \end{aligned}$$

As the last two terms can be estimated by \(O_{\prec }({\mathcal {E}}^2{\mathcal {P}}^{2n-2})\), we have

$$\begin{aligned} X_{3,2}&=-\frac{4n-2}{N^2}\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}\big [ P^{2n-2}P'(G^2)_{ii}G_{ii}G^2_{jj}+P^{2n-2}P'{\underline{G^2}} \,G_{ii}G_{jj}\nonumber \\&\quad +P^{2n-2}({\underline{G}} \,+{\underline{G}} \,^2)G_{ii}G_{jj}\big ] \nonumber \\&\quad +O_{\prec }({\mathcal {E}}^2{\mathcal {P}}^{2n-2}) =:X_{3,2,1}+X_{3,2,2}+X_{3,2,3}+O_{\prec }\big ({\mathcal {E}}^2{\mathcal {P}}^{2n-2}\big ). \end{aligned}$$
(7.26)

Step 3 Let us compute \(X_{3,2,1}\). We write

$$\begin{aligned}&X_{3,2,1}={\mathbb {E}} {\mathcal {S}}(V), \quad \text {where} \nonumber \\&V_{ij}=-(4n-2)N^{-1}{\mathcal {C}}_4(H_{ij}) P^{2n-2}P'N^{-1}(G^2)_{ii} G_{ii}G^2_{jj}. \end{aligned}$$
(7.27)

Note that \(V \in {\mathcal {V}}_0\) with \(\nu _1(V)=2,\theta (V)=2+2\beta ,\nu _3(V)=2n-2\) and \(\nu _4(V)=\nu _5(V)=1\). By Lemma 7.6 we have

$$\begin{aligned}&X_{3,2,1}-{\mathbb {E}} {\mathcal {M}}_{\infty }(V) \prec N^{-2\beta } (\Psi +\sqrt{\kappa +\eta }) (N\eta )^{-1}\Upsilon {\mathbb {E}} |P^{2n-2}| \nonumber \\&\qquad +\sum _{t=1}^{2n-2} N^{-2\beta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t+1}{\mathbb {E}} |P^{2n-2-t}|, \end{aligned}$$
(7.28)

where

$$\begin{aligned} {\mathbb {E}} {\mathcal {M}}_{\infty }(V)= {\mathbb {E}} {\mathcal {M}}(V)+ N^{-2\beta }\sum _{l=2}^{\lceil {\beta ^{-1}} \rceil } b_{l} N^{-l\beta } {\mathbb {E}}P'N^{-1}{\underline{G^2}} \,\,{\underline{G}} \,^{2+2l} P^{2n-2}, \end{aligned}$$

and \(b_2,\ldots ,b_{\lceil {\beta ^{-1}} \rceil }\) are bounded. We can estimate the right-hand side of (7.28) by \(\sum _{r=2}^{2n} O_{\prec }({\mathcal {E}}^{r} {\mathcal {P}}^{2n-2})\), so that

$$\begin{aligned} X_{3,2,1}= & {} {\mathbb {E}} {\mathcal {M}}(V)+ N^{-2\beta }\sum _{l=2}^{\lceil {\beta ^{-1}} \rceil } b_{l} N^{-l\beta } {\mathbb {E}}P'N^{-1}{\underline{G^2}} \,\,{\underline{G}} \,^{2+2l} P^{2n-2} \nonumber \\&+\sum _{r=2}^{2n} O_{\prec }\big ({\mathcal {E}}^{r} {\mathcal {P}}^{2n-2}\big ). \end{aligned}$$
(7.29)

Step 4 Let us consider the term \({\mathbb {E}} {\mathcal {M}}(V)\) in (7.29). Explicitly,

$$\begin{aligned} \begin{aligned} {\mathbb {E}} {\mathcal {M}}(V)&=-(4n-2)N^{2\beta -1}\sum _{i,j}{\mathcal {C}}_4(H_{ij}) \cdot N^{-2\beta } {\mathbb {E}} P'N^{-1} {\underline{G^2}} \,\,{\underline{G}} \,^2 P^{2n-2} \\&=:b_1 N^{-2\beta } {\mathbb {E}} P' N^{-1} {\underline{G^2}} \,\,{\underline{G}} \,^2 P^{2n-2}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} b_1=-(4n-2)N^{2\beta -1}\sum _{i,j}{\mathcal {C}}_4(H_{ij}) \end{aligned}$$
(7.30)

is bounded by Lemma 2.2. Since

$$\begin{aligned} \partial _{w} G_{ij}=(G^2)_{ij}, \end{aligned}$$
(7.31)

we have

$$ P'N^{-1}{\underline{G^2}} \,=N^{-1}(\partial _w P -{\underline{G}} \,)=N^{-1}\Big (\partial _{w}({\underline{HG}} \,)+\partial _{w}(Q)-{\underline{G}} \,\Big ). $$

In addition,

$$\begin{aligned} b_{1} N^{-2\beta -1} {\mathbb {E}}\partial _w(Q){\underline{G}} \,^{3} P^{2n-2}=b_{1} N^{-2\beta -1} {\mathbb {E}}\partial _w(Q_0){\underline{G}} \,^{3} P^{2n-2} + O_{\prec }\big ({\mathcal {E}}^2 {\mathcal {P}}^{2n-2}\big ). \end{aligned}$$

Thus,

$$\begin{aligned} {\mathbb {E}} {\mathcal {M}}(V)&= b_{1} N^{-2\beta -1} {\mathbb {E}}\partial _{w}({\underline{HG}} \,){\underline{G}} \,^3P^{2n-2}+ b_{1} N^{-2\beta -1} {\mathbb {E}}\partial _w(Q_0){\underline{G}} \,^{3} P^{2n-2} \nonumber \\&\quad -N^{-2\beta -1} b_{1} {\mathbb {E}} {\underline{G}} \,^{4} P^{2n-2} + O_{\prec }\big ({\mathcal {E}}^2 {\mathcal {P}}^{2n-2}\big ) \nonumber \\&=:\text {(A)}+\text {(B)}+\text {(C)}+ O_{\prec }\big ({\mathcal {E}}^2 {\mathcal {P}}^{2n-2}\big ). \end{aligned}$$
(7.32)

Step 5 We expand the term (A) again by Lemma 2.1, and get

$$\begin{aligned} \text {(A)}=b_{1} N^{-2-2\beta }\sum _{k=1}^{\ell }\sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \bigg [\frac{\partial ^k \partial _{w}(G_{ji}) {\underline{G}} \,^3 P^{2n-2}}{\partial H_{ij}^k}\bigg ]+O_{\prec }({\mathcal {E}}^{2n}). \end{aligned}$$

By Lemma 7.1, whenever the derivative \(\partial ^k/\partial H_{ij}^k\) on the right-hand side hits \({\underline{G}} \,^3P^{2n-2}\), the corresponding term can be bounded by \(O_{\prec }\big (\sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\big )\). Furthermore, since \(\partial ^k/\partial H_{ij}^k\) commutes with \(\partial _{w}\),

$$\begin{aligned} (A)&=\frac{b_1}{N^{2+2\beta }}\sum _{k=1}^{\ell }\sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \left[ \partial _{w}\left( \frac{\partial ^kG_{ji}}{\partial H_{ij}^k}\right) {\underline{G}} \,^3 P^{2n-2}\right] \\&\quad +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) \\&=:\sum _{k=1}^{\ell }Y_k +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$

The analysis of \(Y_k\) is similar to that of \({\widetilde{X}}_k\) in Sect. 5.2. For \(k=1\), by (7.3), (7.31) and Lemma 7.4 (i), we have

$$\begin{aligned} Y_1&=\frac{b_1}{N^{3+2\beta }}\sum _{i,j}{\mathbb {E}} \big [\partial _{w} \big (-G_{ii}G_{jj}-G_{ij}^2\big ) {\underline{G}} \,^3P^{2n-2}\big ]+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) \nonumber \\&=-\frac{b_1}{N^{1+2\beta }}{\mathbb {E}} \big [\partial _{w} ({\underline{G}} \,^2) {\underline{G}} \,^3P^{2n-2}\big ]-\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] \nonumber \\&\quad +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.33)

For \(k=2\), by (7.3) and (7.31) we see that the most dangerous term is

$$\begin{aligned} \frac{b_1}{N^{1+2\beta }}\sum _{i,j}{\mathcal {C}}_3(H_{ij}){\mathbb {E}} \big [G_{ii}G_{jj}N^{-1}(G^2)_{ij} {\underline{G}} \,^3P^{2n-2}\big ] =:{\mathbb {E}}{\mathcal {S}} ({\widetilde{V}}). \end{aligned}$$

Since \({\mathcal {C}}_3(H_{ij})=O(N^{-1-\beta })\), we see that \(\nu _1({\widetilde{V}}) = 2\), \(\nu _2({\widetilde{V}})=1\), \(\nu _4({\widetilde{V}})=0\), \(\nu _5({\widetilde{V}})=1\), and \(\theta ({\widetilde{V}}) = 2 + 3 \beta \). Thus by, Lemma 7.4 (i),

$$\begin{aligned} {\mathbb {E}}{\mathcal {S}} ({\widetilde{V}})\prec & {} N^{-3\beta }\Upsilon (N\eta )^{-1} {\mathbb {E}} |P|^{2n-2} +\sum _{t=1}^{2n-2} N^{-3\beta } \Upsilon ((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{2n-2-t}| \\\prec & {} \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}, \end{aligned}$$

where in the last step we again used \(w \in {{\mathbf {Y}}}\) and Hölder’s inequality. Other terms in \(Y_2\) also satisfy the same bound. A similar estimate can also be obtained for all even k, which yields

$$\begin{aligned} \sum _{s=1}^{\lfloor {\ell /2} \rfloor } Y_{2s} \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$
(7.34)

For odd \(k \geqslant 3\), we split

$$\begin{aligned} Y_k=Y_{k,1}+Y_{k,2}, \end{aligned}$$

where by definition terms in \(Y_{k,1}\) contain no off-diagonal entries of G or \(G^2\). Use Lemma 7.4 (i), we can again show that

$$\begin{aligned} Y_{k,2} \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$

By Lemma 2.2, we see that

$$\begin{aligned} Y_{k,1}=-\frac{b_1}{N^{3+(k+1)\beta }}\sum _{i,j}a^{(k)}_{ij}{\mathbb {E}} \big [\partial _{w}\big (G^{(k+1)/2}_{ii}G^{(k+1)/2}_{jj}\big ) {\underline{G}} \,^3P^{2n-2}\big ], \end{aligned}$$

where \(a^{(k)}_{ij}\) is deterministic and uniformly bounded. Combining with (7.33)–(7.34), we obtain

$$\begin{aligned}&\text {(A)}+\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ]+\frac{b_1}{N^{1+2\beta }}{\mathbb {E}} \big [\partial _{w} ({\underline{G}} \,^2) {\underline{G}} \,^3P^{2n-2}\big ] \nonumber \\&\quad =-\sum _{s=2}^{\lceil {\ell /2} \rceil }\frac{b_1}{N^{3+2s\beta }}\sum _{i,j} a^{(2s-1)}_{ij}{\mathbb {E}} \big [\partial _{w}\big (G^{s}_{ii}G_{jj}^{s}\big ){\underline{G}} \,^3P^{2s-2}\big ]+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) \nonumber \\&\quad =:-\sum _{s=2}^{\lceil {\ell /2} \rceil }\frac{b_1}{N^{1+2\beta }} {\mathbb {E}}\big [\partial _{w}\big ({\mathcal {S}}\big (T^{(s)}\big )\big ){\underline{G}} \,^3P^{2n-2}\big ] +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) , \end{aligned}$$
(7.35)

where

$$\begin{aligned} T^{(s)}=\frac{1}{N^{2+(2s-2)\beta }} a_{ij}^{(2s-1)} G_{ii}^sG_{jj}^s\in {\mathcal {T}}_0. \end{aligned}$$
(7.36)

By Lemma 7.7, we have

$$\begin{aligned}&\frac{b_1}{N^{1+2\beta }} {\mathbb {E}}[\partial _{w}\big ({\mathcal {S}}\big (T^{(s)}\big )\big ){\underline{G}} \,^3P^{2n-2}]\\&\quad =\frac{b_1}{N^{1+2\beta }} {\mathbb {E}}\big [\partial _{w}\big ({\mathcal {M}}\big (\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)}\big )\big ){\underline{G}} \,^3P^{2n-2}\big ]\\&\qquad +O_{\prec }\big (N^{-2s\beta }\Upsilon ((N\eta )^{-1}+N^{-1+(2s-2)\beta }){\mathbb {E}} |P|^{2n-2}\big ) \\&\qquad +\sum _{t=1}^{2n-2} O_{\prec }\big (N^{-2s\beta }\Upsilon ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^t{\mathbb {E}} |P|^{2n-2-t}\big ). \end{aligned}$$

Since \(s \geqslant 2\), one readily checks that the last two terms can be bounded by \(O_{\prec }(\sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r})\). Thus (7.35) reads

$$\begin{aligned} \text {(A)}&=\frac{b_1}{N^{1+2\beta }}\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}}\big [ \partial _{w}\big ({\mathcal {M}}\big (\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)}\big )\big ){\underline{G}} \,^3P^{2n-2}\big ] \nonumber \\&\quad -\frac{b_1}{N^{1+2\beta }}{\mathbb {E}} \big [\partial _{w} ({\underline{G}} \,^2) {\underline{G}} \,^3P^{2n-2}\big ] \nonumber \\&\quad -\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ]+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.37)

Note that by construction, \(T^{(s)}\) in (7.36) is the same as in (5.16). From Lemma 7.7, we see that the term \({\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)})\) in (7.37) is the same as in (5.18), which implies

$$\begin{aligned} {\underline{G}} \,^2-\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} {\mathcal {M}}\big (\lceil {\beta ^{-1}} \rceil -2s+2,T^{(s)}\big )=Q_0({\underline{G}} \,). \end{aligned}$$

Thus (7.37) reduces to

$$\begin{aligned} \text {(A)}= & {} -\frac{b_1}{N^{1+2\beta }}{\mathbb {E}} \big [\partial _{w} (Q_0) {\underline{G}} \,^3P^{2n-2}\big ] \nonumber \\&-\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ]+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.38)

Final Step By (7.32) and (7.38), we see that there is a cancellation between \(\mathrm {(A)}\) and \(\mathrm {(B)}\), which leads to

$$\begin{aligned} {\mathbb {E}} {\mathcal {M}}(V)= & {} -\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] -\frac{b_1}{N^{1+2\beta }} {\mathbb {E}} {\underline{G}} \,^{4} P^{2n-2} \nonumber \\&+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.39)

The first two terms on right-hand side of (7.39) are stochastically dominated by

$$\begin{aligned} \frac{\Upsilon }{N\eta }{\mathbb {E}}|P^{2n-2}|+\frac{1}{N^{1+2\beta }}{\mathbb {E}} |P^{2n-2}|, \end{aligned}$$
(7.40)

and one can check that \(\Upsilon /(N\eta )\gg {\mathcal {E}}^2\) and \(N^{-1-2\beta } \gg {\mathcal {E}}^2\), so that we need to keep track of these terms in order to obtain a further cancellation.

So far we have been dealing with \({\mathbb {E}} {\mathcal {M}}(V)\) in (7.29), and other terms in (7.29) can be handled in the same way as in Steps 4 and 5. Compared to \({\mathbb {E}}{\mathcal {M}}(V)\), each \(N^{-2\beta }b_{l} N^{-l\beta } {\mathbb {E}}P'N^{-1}{\underline{G^2}} \,\,{\underline{G}} \,^{2+2l} P^{2n-2}\) contains an additional factor \(N^{-l\beta }\). Similarly to (7.39) and (7.40), it can be shown that

$$\begin{aligned}&N^{-2\beta }b_{l} N^{-l\beta } {\mathbb {E}}P'N^{-1}{\underline{G^2}} \,\,{\underline{G}} \,^{2+2l} P^{2n-2} \\&\quad \prec N^{-l\beta }\Big (\frac{\Upsilon }{N\eta }{\mathbb {E}}|P^{2n-2}|+\frac{1}{N^{1+2\beta }}{\mathbb {E}} |P^{2n-2}|\Big ) +\sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r} \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r} \end{aligned}$$

for all \(l \geqslant 2\). As a result, we have

$$\begin{aligned} X_{3,2,1}=-\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] -\frac{b_1}{N^{1+2\beta }} {\mathbb {E}} \big [{\underline{G}} \,^{4} P^{2n-2}\big ] +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) ,\nonumber \\ \end{aligned}$$
(7.41)

where \(b_1\) is defined as in (7.30).

Next, we consider the other terms on right-hand side of (7.26). Similarly to (7.41), we can also show that

$$\begin{aligned} X_{3,2,2}= & {} -\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^2P^{2n-2}\big ] -\frac{b_1}{N^{1+2\beta }} {\mathbb {E}} \big [{\underline{G}} \,^{3} P^{2n-2}\big ]\\&+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) \end{aligned}$$

as well as

$$\begin{aligned} X_{3,2,3}=\frac{b_1}{N^{1+2\beta }} {\mathbb {E}} \big [({\underline{G}} \,^3+{\underline{G}} \,^4) P^{2n-2}\big ]+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$

Note that this results in two cancellations on right-hand side of (7.26), and we have

$$\begin{aligned} X_{3,2}=-\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] -\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^2P^{2n-2}\big ] +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$

As we have already estimated \(X_{3,1}\) and \(X_{3,3}\) in Step 1, we conclude that

$$\begin{aligned} X_{3}=-\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] -\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^2P^{2n-2}\big ] +O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) .\nonumber \\ \end{aligned}$$
(7.42)

Remark 7.8

The crucial step in analysing \(X_3\) is the computation of \(X_{3,2,1}\) in (7.41). As in (7.27), we can write \(X_{3,2,1}={\mathbb {E}}{\mathcal {S}}(V)\), with \(V\in {\mathcal {V}}_0\), \(\nu _1(V)-\theta (V)=-2\beta \), \(\nu _3(V)=2n-2\), and \(\nu _4(V)=\nu _5(V)=1\). Since

$$\begin{aligned} \Big |\frac{1}{N^2}{\underline{G^3}} \,\Big |\leqslant \frac{{{\,\mathrm{Im}\,}}{\underline{G}} \,}{N^2\eta ^2}\leqslant \frac{\Upsilon }{N\eta }, \end{aligned}$$

the formula (7.41) implies the estimate

$$\begin{aligned} X_{3,2,1} \prec \Big (\frac{1}{N^{1+2\beta }}+\frac{\Upsilon }{N^{1+2\beta }\eta }\Big ){\mathcal {P}}^{2n-2}+ \sum _{r=2}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$

The argument for \(X_{3,2,1}\) can be repeated for general \({\mathbb {E}} {\mathcal {S}}(V)\), which allows one to show the following result.

Lemma 7.9

Let \(V\in {\mathcal {V}}_0\), with \(\nu _1(V)-\theta (V)\leqslant -2\beta \), \(\nu _3(V)=2n-2\), and \(\nu _4(V)=\nu _5(V)=1\). Then

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V) \prec \Big (\frac{1}{N^{1+\theta -\nu _1}}+\frac{\Upsilon }{N^{1+\theta -\nu _1}\eta }\Big ){\mathcal {P}}^{2n-2}+ \sum _{r=2}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r} \end{aligned}$$

7.3.4 Conclusion

After the steps in Sects. 7.3.17.3.3, it remains to estimate \(X_k\) for \(k\geqslant 4\).

When \(k\geqslant 4\) is even, the estimate of \(X_k\) is similar to that of \(X_2\) in Sect. 7.3.2. In fact, by Lemma 2.2, we see that there will be additional factors of \(N^{-\beta }\) in \(X_k\) when \(k \geqslant 4\), which makes the estimate easier. Using Lemma 7.4 (i), one can show that

$$\begin{aligned} \sum _{s=2}^{\lceil {\ell /2} \rceil }X_{2s} \prec \sum _{r=2}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$

When \(k\geqslant 4\) is odd, the estimate of \(X_k\) is similar to that of \(X_3\) in Sect. 7.3.3. By Lemma 2.2, we see that there will be additional factors of \(N^{-(k-2)\beta }\) in \(X_k, k\geqslant 4\). Using Lemmas 7.17.4 and 7.9, one can show that

$$\begin{aligned} \sum _{s=2}^{\lceil {\ell /2} \rceil }X_{2s+1} \prec \Big (\frac{1}{N^{1+4\beta }}+\frac{\Upsilon }{N^{1+4\beta }\eta }\Big ){\mathcal {P}}_n^{2n-2}+ \sum _{r=2}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r} \prec \sum _{r=2}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$

As a result, we arrive at

$$\begin{aligned} \text {(IV')}= & {} \sum _{k=1}^{\ell }X_k =-\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] -\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^2P^{2n-2}\big ] \nonumber \\&+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) , \end{aligned}$$
(7.43)

where \(b_1=-(4n-2)N^{2\beta -1}\sum _{i,j}{\mathcal {C}}_4(H_{ij})\) is bounded.

7.4 The computation of (II’) in (7.14)

Using Lemma 2.1 with \(h=H_{ij}\), we have

$$\begin{aligned} \text {(II')}&={\mathbb {E}} {\mathcal {Z}} {\underline{G}} \,^2P^{2n-1}=\frac{1}{N}\sum _{i,j} {\mathbb {E}} \Big (H_{ij}^2-\frac{1}{N}\Big ) {\underline{G}} \,^2 P^{2n-1}=\frac{1}{N}\sum _{k=1}^{\ell } \frac{1}{k!} \nonumber \\&\sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \bigg [\frac{\partial ^k H_{ij}{\underline{G}} \,^2P^{2n-1}}{\partial H_{ij}^k}\bigg ] +O_{\prec }({\mathcal {E}}^{2n})-{\mathbb {E}}{\underline{G}} \,^2 P^{2n-1}. \end{aligned}$$
(7.44)

For each k, we write

$$\begin{aligned}&\frac{1}{N} \frac{1}{k!} \sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \bigg [\frac{\partial ^k H_{ij}{\underline{G}} \,^2P^{2n-1}}{\partial H_{ij}^k}\bigg ] =\frac{1}{N} \frac{1}{k!} \\&\quad \sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \bigg [H_{ij}\frac{\partial ^k {\underline{G}} \,^2P^{2n-1}}{\partial H_{ij}^k}\bigg ] \\&\quad +\frac{1}{N} \frac{1}{(k-1)!} \sum _{i,j}{\mathcal {C}}_{k+1}(H_{ij}){\mathbb {E}} \bigg [\frac{\partial ^{k-1} {\underline{G}} \,^2P^{2n-1}}{\partial H_{ij}^{k-1}}\bigg ]=:Z_k+{\widehat{X}}_k. \end{aligned}$$

Each \(Z_k\) can be handled again by applying Lemma 2.1 with \(h=H_{ij}\). One easily shows that

$$\begin{aligned} \sum _{k=1}^{\ell }Z_k \prec \sum _{r=1}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$
(7.45)

By \({\mathcal {C}}_2(H_{ij})=N^{-1}(1+O(\delta _{ij}))\), we have

$$\begin{aligned} {\widehat{X}}_{1}= & {} \frac{1}{N}\sum _{i,j} \frac{1}{N} {\mathbb {E}} {\underline{G}} \,^2P^{2n-1}+\frac{1}{N}\sum _{i} \Big ( {\mathcal {C}}_2(H_{ii})-\frac{1}{N}\Big ) {G}^2 E P^{2n-1} \\= & {} {\mathbb {E}} {\underline{G}} \,^2P^{2n-1}+O_{\prec }(N^{-1}{\mathcal {P}}^{2n-1}). \end{aligned}$$

Combining with (7.44) and (7.45), we have

$$\begin{aligned} \mathrm {(II')}= \sum _{k=2}^{\ell }{\widehat{X}}_{k}+O_{\prec }\left( \sum _{r=1}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.46)

The analysis of \({\widehat{X}}_{k}\) is similar to those of \(X_k\) in Sect. 7.3, and we only sketch the key steps.

For \(k=2\), we see from (7.3) that the most dangerous term in \({\widehat{X}}_{2}\) is

$$\begin{aligned} \frac{1}{N}\sum _{i,j}{\mathcal {C}}_{3}(H_{ij}){\mathbb {E}} \bigg [{\underline{G}} \,^2 (2n-1)P^{2n-2}\frac{\partial P}{\partial H_{ij}}\bigg ], \end{aligned}$$
(7.47)

which is very close to the left-hand side of (7.23). We can apply Lemma 7.4 (i) and show that (7.47) is bounded by \(O_{\prec }(\sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r})\). Similarly, we can also handle all the other terms in \({\widehat{X}}_{2}\), which leads to

$$\begin{aligned} {\widehat{X}}_{2} \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$
(7.48)

For \(k=3\), by the differential rule (7.3), we see that the most dangerous term in \({\widehat{X}}_{3}\) is

$$\begin{aligned} {\widehat{X}}_{3,2}:=\frac{1}{2N}\sum _{i,j}{\mathcal {C}}_{4}(H_{ij}){\mathbb {E}} \bigg [{\underline{G}} \,^2 \frac{\partial ^2 P^{2n-1}}{\partial H_{ij}^2}\bigg ], \end{aligned}$$

which is very close to the right-hand side of (7.25). Similarly to (7.26), we have

$$\begin{aligned}&{\widehat{X}}_{3,2}=\frac{4n-2}{N^2} \nonumber \\&\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}\big [P^{2n-2}P'(G^2)_{ii}G_{jj}{\underline{G}} \,^2 +P^{2n-2}P'{\underline{G^2}} \,\,{\underline{G}} \,^2+P^{2n-2}({\underline{G}} \,+{\underline{G}} \,^2){\underline{G}} \,^2\big ]\\&\quad +O_{\prec }\big ({\mathcal {E}}^2{\mathcal {P}}^{2n-2}\big ) =:{\widehat{X}}_{3,2,1}+{\widehat{X}}_{3,2,2}+{\widehat{X}}_{3,2,3}+O_{\prec }\big ({\mathcal {E}}^2{\mathcal {P}}^{2n-2}\big ), \end{aligned}$$

and the right-hand side can be computed similarly to \(X_{3,2,1}, X_{3,2,2}, X_{3,2,3}\) in (7.26). As a result, we can show that

$$\begin{aligned} {\widehat{X}}_3= & {} \frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] +\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^2P^{2n-2}\big ] \nonumber \\&+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) , \end{aligned}$$
(7.49)

where \(b_1=-(4n-2)N^{2\beta -1}\sum _{i,j}{\mathcal {C}}_4(H_{ij})\).

For \(k \geqslant 4\), the argument is similar to that in Sect. 7.3.4. We can show that

$$\begin{aligned} \sum _{k=4}^{\ell }{\widehat{X}}_k \prec \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}. \end{aligned}$$

Combining the above with (7.46)–(7.49), we have

$$\begin{aligned} \text {(II')}= & {} \frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^3P^{2n-2}\big ] +\frac{b_1}{N^{2+2\beta }}{\mathbb {E}} \big [{\underline{G^3}} \,\, {\underline{G}} \,^2P^{2n-2}\big ] \nonumber \\&+O_{\prec }\left( \sum _{r=2}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.50)

Now observe the cancellation between (7.43) and (7.50), which leads to

$$\begin{aligned} \text {(II')}+\text {(IV')} \prec \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r} \end{aligned}$$

as desired.

7.5 The estimate of (7.15)

From the construction of \(P_0\) in Sect. 5.2, we can easily show that

$$\begin{aligned} {\mathbb {E}} Q_0 +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k} \bigg ] \prec \Upsilon , \end{aligned}$$

and in this section we shall see that the analogue holds when the factor \(P^{2n-1}\) is added inside the expectations. Let us write

$$\begin{aligned} \sum _{k=1}^{\ell } X^{(1)}_k:=\sum _{k=1}^\ell \frac{1}{N}\frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k} P^{2n-1}\bigg ]=\mathrm {(III')} \end{aligned}$$

and analyse each \(X_k^{(1)}\).

Let us first consider the case when k is odd. For \(k=1\), it is easy to see from (7.3) and Lemma 2.8 that

$$\begin{aligned} X_1^{(1)}= & {} \frac{1}{N^2}\sum _{i,j}{\mathbb {E}}\big [-G_{ii}G_{jj}P^{2n-1}\big ]+O_{\prec }\big (\Upsilon {\mathbb {E}} |P^{2n-1}|\big )\nonumber \\= & {} -{\mathbb {E}} {\underline{G}} \,^2 P^{2n-1}+O_{\prec } \big ({\mathcal {E}} {\mathcal {P}}^{2n-1}\big ). \end{aligned}$$
(7.51)

For odd \(k \geqslant 3\), we see from (7.3) and Lemma 2.8 that

$$\begin{aligned} X_k^{(1)}=\frac{1}{N^{2+(k-1)\beta }} \sum _{i,j} a_{ij}^{(k)}{\mathbb {E}} \big [G_{ii}^{(k+1)/2}G_{jj}^{(k+1)/2}P^{2n-1}\big ]+O_{\prec } \big ({\mathcal {E}} {\mathcal {P}}^{2n-1}\big ), \end{aligned}$$
(7.52)

where \(a^{(k)}_{ij}\) is deterministic and uniformly bounded.

For even k, we follow a similar strategy as in Sect. 7.3.2. We see from (7.3) and Lemma 2.8 that

$$\begin{aligned} X_k^{(1)}=\frac{1}{N^{2+(k-1)\beta }} \sum _{i,j} a_{ij}^{(k)}{\mathbb {E}} \big [G_{ij}G_{ii}^{k/2}G_{jj}^{k/2}P^{2n-1}\big ]+O_{\prec } \big ({\mathcal {E}} {\mathcal {P}}^{2n-1}\big ), \end{aligned}$$
(7.53)

where \(a^{(k)}_{ij}\) is deterministic and uniformly bounded. The first term on right-hand side of (7.53) can be written as \({\mathbb {E}} {\mathcal {S}}(V)\), where \(V\in {\mathcal {V}}\), \(\nu _2(V)\ne 0\) and \(\nu _4(V)=\nu _5(V)=0\). Thus we can apply Lemma 7.4 (ii) to estimate this term, and show that it is bounded by \( O_{\prec }(\sum _{r=1}^{2n} {\mathcal {E}}^r {\mathcal {P}}^{2n-r})\). This implies

$$\begin{aligned} \sum _{s=2}^{\lceil {\ell /2} \rceil }X^{(1)}_{2s} \prec \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$
(7.54)

Combining (7.51), (7.52) and (7.54), we have

$$\begin{aligned} \text {(III')}&=-{\mathbb {E}}[ {\underline{G}} \,^2 P^{2n-1}]+\sum _{s=2}^{\lceil {\ell /2} \rceil }\frac{1}{N^{2+(2s-2)\beta }}\sum _{i,j}a_{ij}^{(2s-1)}{\mathbb {E}} \big [G_{ii}^sG_{jj}^sP^{2n-1}\big ] \nonumber \\&\quad +O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}\right) \nonumber \\&=:-{\mathbb {E}} [{\underline{G}} \,^2 P^{2n-1}]+\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} [{\mathcal {S}}(T^{(s)})P^{2n-1}]+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}\right) , \end{aligned}$$
(7.55)

where we recall the definition of \({\mathcal {S}}(T)\) in (5.2). Observe that from the above steps, \(T^{(s)}\) in (7.55) is the same as in (5.16). To handle \({\mathbb {E}}[ {\mathcal {S}}(T^{(s)})P^{2n-1}]\), we introduce the following analogue of Lemmas 5.6 and 7.7.

Lemma 7.10

Let \(T \in {\mathcal {T}}_0\) with \(\nu _1(T)-\theta (T) \leqslant -2\beta \). Fix \(r \in {\mathbb {N}}\) and let \({\mathcal {M}}(r,T)\) be as in Lemma 5.6. Then

$$\begin{aligned} {\mathbb {E}} [{\mathcal {S}}(T)P^{2n-1}]= & {} {\mathbb {E}} [{\mathcal {M}}(r, T)P^{2n-1}]+O_{\prec }\big (N^{\nu _1(T)-\theta (T)}\big (\Upsilon +N^{-\beta (r+1)}\big ){\mathcal {P}}^{2n-1}\big ) \\&+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$

Proof

The proof analogous to those of Lemmas 5.6 and 7.7. We use the identity

$$\begin{aligned} G_{ii}={\underline{G}} \,+G_{ii}{\underline{HG}} \,-(HG)_{ii}{\underline{G}} \, \end{aligned}$$

to replace the diagonal entries in \({\mathcal {S}}(T)\), and then expand the terms containing H using Lemma 2.1. We omit the details. \(\square \)

By Lemma 7.10 we have, for any \(s \in \{2,3,\ldots ,\lceil {\ell /2} \rceil \}\),

$$\begin{aligned} {\mathbb {E}} \big [{\mathcal {S}}\big (T^{(s)}\big )P^{2n-1}\big ]= & {} {\mathbb {E}} \big [{\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)})P^{2n-1}\big ]\\&+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$

Together with (7.55) we have

$$\begin{aligned} \text {(III')}= & {} -{\mathbb {E}}\big [ {\underline{G}} \,^2 P^{2n-1}\big ]+\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} \big [{\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)})P^{2n-1}\big ] \nonumber \\&+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}\right) . \end{aligned}$$
(7.56)

From Lemma 7.10, we see that the term \({\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)})\) in (7.56) is the same as in (5.18), which implies

$$\begin{aligned} {\underline{G}} \,^2-\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} {\mathcal {M}}\big (\lceil {\beta ^{-1}} \rceil -2s+2,T^{(s)}\big )=Q_0({\underline{G}} \,). \end{aligned}$$

Thus

$$\begin{aligned} \text {(III')}=-{\mathbb {E}} [Q_0P^{2n-1}]+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r}\right) , \end{aligned}$$

and together with (7.15) we conclude that

$$\begin{aligned} \text {(I')}+\text {(III')} \prec \sum _{r=1}^{2n}{\mathcal {E}}^{r} {\mathcal {P}}^{2n-r} \end{aligned}$$

as desired. This concludes the proof of Lemma 7.2 and hence also that of Proposition 6.1.

8 Proof of Proposition 6.3

Convention

Throughout this section, z is given by (6.1), where \(w \in {{\mathbf {Y}}}_*\) is deterministic.

Let us fix \(n \in {\mathbb {N}}_+\) and set

$$\begin{aligned} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}} :=\Vert {{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,)\Vert _{2n}=\Big ({\mathbb {E}} |{{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,)|^{2n}\Big )^{\frac{1}{2n}}, \quad {\mathcal {E}}_{{{\,\mathrm{Im}\,}}} :=\Big (\frac{1}{N\eta } +\Phi \Big ) \sqrt{\kappa }N^{-\delta }. \end{aligned}$$

We shall show that

$$\begin{aligned} {\mathbb {E}} |{{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,)|^{2n}={\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n} \prec {\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{2n}, \end{aligned}$$
(8.1)

from which Proposition 6.3 follows by Chebyshev’s inequality. We shall see that the proof of (8.1) is much simpler than that of (7.1), as it does not require a secondary expansion as in Sect. 7.3.3. We define the parameter

$$\begin{aligned} \Theta :=\frac{\Phi +\frac{\eta }{\sqrt{\kappa }}}{N\eta }. \end{aligned}$$

Recall the definition of \(\Gamma \) from (5.3). It is easy to check that

$$\begin{aligned} \Gamma \prec \Theta . \end{aligned}$$

In addition, recall the definitions of \(P'\), Q and \(Q_0\) from (7.2). With the help of (6.4), we obtain the following improved version of Lemma 7.1.

Lemma 8.1

We have

$$\begin{aligned} P' \prec \sqrt{\kappa }, \quad \quad \bigg |\frac{\partial ^k {\underline{G}} \,}{\partial H_{ij}^k}\bigg | \prec \max _{x,y} N^{-1}|(G^2)_{xy} |\prec \Theta \end{aligned}$$
(8.2)

and

$$\begin{aligned} \frac{\partial ^k P}{\partial H_{ij}^k} \prec \sqrt{\kappa }\,\Theta . \end{aligned}$$
(8.3)

By \(zG=HG-I\), we have

$$\begin{aligned} {\mathbb {E}} ({{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,))^{2n}&={\mathbb {E}} ({{\,\mathrm{Im}\,}}{\underline{HG}} \,+{{\,\mathrm{Im}\,}}Q)({{\,\mathrm{Im}\,}}P)^{2n-1} \nonumber \\&={\mathbb {E}} [{{\,\mathrm{Im}\,}}Q({{\,\mathrm{Im}\,}}P)^{2n-1}]+\frac{1}{N}\sum _{i,j} {\mathbb {E}} [H_{ij} {{\,\mathrm{Im}\,}}G_{ji} ({{\,\mathrm{Im}\,}}P)^{2n-1}], \end{aligned}$$
(8.4)

where in the second step we used that H has real entries.

Remark 8.2

Although we used that the entries of H are real in (8.4), our argument easily extends to complex entries of H. To see how, for any holomorphic \(f:{\mathbb {C}}_+\rightarrow {\mathbb {C}}\) we define \(J f(z) :=\frac{1}{2 \mathrm {i}} (f(z) - f({{\overline{z}} \,}))\). We view all quantities appearing in our arguments as functions of z and use the operator J instead of \({{\,\mathrm{Im}\,}}\). Then it is easy to check that in both real and complex cases, Proposition 6.3 as well as all its consequences remain true if we replace \({{\,\mathrm{Im}\,}}\) by J everywhere. Note that \({{\,\mathrm{Im}\,}}{{\underline{G}} \,} = J {{\underline{G}} \,}\) and \({{\,\mathrm{Im}\,}}P = J P\), but in general \({{\,\mathrm{Im}\,}}G_{ij} \ne J G_{ij}\). An alternative point of view is to regard all of our quantities as functions of z and H, and to take the imaginary part with respect to the Hermitian conjugation of z and H.

Similarly to (7.11), we can use Lemma 2.1 on the last term of (8.4), and get

$$\begin{aligned}&{\mathbb {E}} ({{\,\mathrm{Im}\,}}P(z,{\underline{G}} \,))^{2n} \nonumber \\&\quad ={\mathbb {E}}[ {{\,\mathrm{Im}\,}}Q ({{\,\mathrm{Im}\,}}P)^{2n-1}]+\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) \nonumber \\&\quad \qquad {\mathbb {E}} \bigg [ \frac{\partial ^k {{\,\mathrm{Im}\,}}G_{ij}}{\partial H_{ij}^k} ({{\,\mathrm{Im}\,}}P)^{2n-1}\bigg ] \nonumber \\&\qquad +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s( {{\,\mathrm{Im}\,}}P)^{2n-1}}{\partial H_{ij}^s} \frac{\partial ^{k-s} {{\,\mathrm{Im}\,}}G_{ij}}{\partial H_{ij}^{k-s}}\bigg ] \nonumber \\&\qquad +O_{\prec }(N^{-4n}) \nonumber \\&\quad =:\text {(V)}+\text {(VI)}+\text {(VII)}+O_{\prec }(N^{-4n}). \end{aligned}$$
(8.5)

We shall prove the following result, which directly implies (8.1).

Lemma 8.3

Let (V)–(VII) be as in (8.5). Then

$$\begin{aligned} \mathrm {(V)}+\mathrm {(VI)} \prec \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r} \end{aligned}$$
(8.6)

and

$$\begin{aligned} \mathrm {(VII)} \prec \sum _{r=1}^{2n} {\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}. \end{aligned}$$
(8.7)

8.1 Proof of (8.7)

Define

$$\begin{aligned} X_k^{(2)}:=\frac{1}{N}\frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s( {{\,\mathrm{Im}\,}}P)^{2n-1}}{\partial H_{ij}^s} \frac{\partial ^{k-s} {{\,\mathrm{Im}\,}}G_{ij}}{\partial H_{ij}^{k-s}}\bigg ], \end{aligned}$$
(8.8)

so that \(\mathrm {(VII)}=\sum _{k=1}^{\ell }X_{k}^{(2)}\). Note that for \(f:{\mathbb {R}} \rightarrow {\mathbb {C}}\) and h real, \(\frac{\mathrm {d}{{\,\mathrm{Im}\,}}f(h)}{\mathrm {d}h}={{\,\mathrm{Im}\,}}\frac{\mathrm {d}f(h)}{\mathrm {d}h}\), so that the derivatives in (8.8) can be computed through (7.3). Let us estimate each \(X_k^{(2)}\).

For any fixed \(k \in {\mathbb {N}}_+\), it is easy to see from (8.3) that

$$\begin{aligned} X_{k}^{(2)}\prec \frac{1}{N^2}\sum _{s=1}^k \sum _{r=1}^{2n-1} \sum _{i,j}{\mathbb {E}} \bigg |(\sqrt{\kappa }\Theta )^r({{\,\mathrm{Im}\,}}P)^{2n-1-r} {{\,\mathrm{Im}\,}}\frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}\bigg |. \end{aligned}$$

By (7.3) and Proposition 2.5, we see that

$$\begin{aligned} {{\,\mathrm{Im}\,}}\frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}&\prec \big |{{\,\mathrm{Im}\,}}\big ( G_{ii}^{\lfloor {(k-s+1)/2} \rfloor }G_{jj}^{\lfloor {(k-s+1)/2} \rfloor }\big )\big |+|G_{ij}|+\max _{x,y \in \{i,j\}}\big |N^{-1}(G^2)_{xy}\big | \nonumber \\&\prec \max _{x \in \{i,j\}}{{\,\mathrm{Im}\,}}G_{xx}+|G_{ij}|+\max _{x,y \in \{i,j\}}\big |N^{-1}(G^2)_{xy}\big | \nonumber \\&\prec {{\,\mathrm{Im}\,}}{\underline{G}} \,+|G_{ij}|+\Theta , \end{aligned}$$
(8.9)

where in the last step we estimated \({{\,\mathrm{Im}\,}}G_{xx}\) by \(O_\prec ({{\,\mathrm{Im}\,}}{{\underline{G}} \,})\), using its spectral decomposition and Lemma 2.6. Here we see the crucial effect of taking imaginary part of P, which results \({{\,\mathrm{Im}\,}}{\underline{G}} \,\) on right-hand side of (8.9) instead of \({\underline{G}} \,\). Note that \({{\,\mathrm{Im}\,}}{\underline{G}} \, \leqslant \Phi +{{\,\mathrm{Im}\,}}m\asymp \Phi +\eta /\sqrt{\kappa }\), and together with Lemma 2.8 we have

$$\begin{aligned} \frac{1}{N^2} \sum _{i,j}\bigg |{{\,\mathrm{Im}\,}}\frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}} \bigg |\prec & {} \Phi +\frac{\eta }{\sqrt{\kappa }}+\frac{1}{N^2} \sum _{i,j}|G_{ij}|+\Theta \\\prec & {} \Phi +\frac{\eta }{\sqrt{\kappa }}+\Theta ^{1/2}+\Theta \prec \Phi + \Theta ^{1/2}. \end{aligned}$$

Thus

$$\begin{aligned} X_k^{(2)} \prec \sum _{r=1}^{2n-1} (\sqrt{\kappa }\Theta )^{r} (\Phi +\Theta ^{1/2}) {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1-r}. \end{aligned}$$
(8.10)

By Cauchy-Schwarz and (6.5) we have \(\Theta ^{1/2}\prec \Psi +\frac{1}{N\eta }+\frac{\eta }{\sqrt{\kappa }}\prec \Psi +\frac{1}{N\eta }\), thus

$$\begin{aligned} (\sqrt{\kappa }\Theta ) (\Phi +\Theta ^{1/2}) \prec (\sqrt{\kappa }\Theta ) \Big (\Psi +\frac{1}{N\eta }\Big )=N^{\delta }\Theta \cdot {\mathcal {E}}_{{{\,\mathrm{Im}\,}}} \prec {\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^2. \end{aligned}$$

Together with \( (\sqrt{\kappa }\Theta )^{s} \prec {\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{s} \) for all \(s\geqslant 0\), we get

$$\begin{aligned} (\sqrt{\kappa }\Theta )^{r} (\Phi +\Theta ^{1/2}) \prec {\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r+1} \end{aligned}$$

for all \(r\geqslant 1\). Combining the above estimate with (8.10), we have

$$\begin{aligned} X_k^{(2)} \prec \sum _{r=1}^{2n-1} {\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r+1} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1-r}. \end{aligned}$$

This concludes the proof of (8.7).

8.2 Proof of (8.6)

The proof is similar to the estimate of (7.15) in Sect. 7.5. Define

$$\begin{aligned} X^{(3)}_k:=\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k {{\,\mathrm{Im}\,}}G_{ij}}{\partial H_{ij}^k} ({{\,\mathrm{Im}\,}}P)^{2n-1}\bigg ], \end{aligned}$$

so that \(\mathrm {(VI)}=\sum _{k=1}^{\ell }X_k^{(3)}\). We analyse each \(X_k^{(3)}\).

Consider first the case when k is odd. For \(k=1\), it is easy to see from (7.3) and Lemma 2.8 that

$$\begin{aligned} X_1^{(3)}&=\frac{1}{N^2}\sum _{i,j}{\mathbb {E}}\big [-{{\,\mathrm{Im}\,}}(G_{ii}G_{jj}) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ]+O_{\prec }\big (\Theta {\mathbb {E}} \big [|{{\,\mathrm{Im}\,}}P|^{2n-1}\big ]\big ) \nonumber \\&=-{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}({\underline{G}} \,^2) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ]+O_{\prec } \big ({\mathcal {E}}_{{{\,\mathrm{Im}\,}}} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1}\big ). \end{aligned}$$
(8.11)

When \(k \geqslant 3\) is odd, we see from (7.3) and Lemma 2.8 that

$$\begin{aligned} X_k^{(3)}= & {} \frac{1}{N^{2+(k-1)\beta }} \sum _{i,j} a_{ij}^{(k)}{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}\big (G_{ii}^{(k+1)/2}G_{jj}^{(k+1)/2}\big )({{\,\mathrm{Im}\,}}P)^{2n-1}\big ] \nonumber \\&+O_{\prec } \big ({\mathcal {E}}_{{{\,\mathrm{Im}\,}}} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1}\big ), \end{aligned}$$
(8.12)

where \(a^{(k)}_{ij}\) is deterministic and uniformly bounded.

For even k, we see from (7.3) and Lemma 2.8 that

$$\begin{aligned} X_{k}^{(3)}= & {} \frac{1}{N^{2+(k-1)\beta }} \sum _{i,j} a_{ij}^{(k)}{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}\big (G_{ij}G_{ii}^{k/2}G_{jj}^{k/2}\big )({{\,\mathrm{Im}\,}}P)^{2n-1}\big ]\nonumber \\&+O_{\prec } \big ({\mathcal {E}}_{{{\,\mathrm{Im}\,}}} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1}\big ), \end{aligned}$$
(8.13)

where \(a^{(k)}_{ij}\) is deterministic and uniformly bounded. Note that the analogue of (8.13) has appeared in (7.53). To handle this term, we use the following result.

Lemma 8.4

Fix an even \(k \geqslant 2\). Let \(\big (a^{(k)}_{ij}\big )_{i,j=1}^N\) be deterministic and uniformly bounded. Then

$$\begin{aligned} \frac{1}{N^{2}} \sum _{i,j} a_{ij}^{(k)}{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}\big (G_{ij}G_{ii}^{k/2}G_{jj}^{k/2}\big )({{\,\mathrm{Im}\,}}P)^{2n-1}\big ] \prec \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^r {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}. \end{aligned}$$

Proof

The proof essentially follows from the strategy of showing Lemmas 5.3 and 7.4. We use the identity

$$\begin{aligned} G_{ij}=\delta _{ij}{\underline{G}} \,+G_{ij}{\underline{HG}} \,-(HG)_{ij}{\underline{G}} \, \end{aligned}$$

to replace the \(G_{ij}\) in the equation, and then expand the terms containing H using Lemma 2.1. We omit the details. \(\square \)

Lemma 8.4 immediately implies

$$\begin{aligned} \sum _{s=2}^{\lceil {\ell /2} \rceil }X^{(3)}_{2s} \prec \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}. \end{aligned}$$
(8.14)

Combining (8.11)–(8.14), we have

$$\begin{aligned} \text {(VI)}&=-{\mathbb {E}}\big [ {{\,\mathrm{Im}\,}}({\underline{G}} \,^2) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ] \\&\quad +\sum _{s=2}^{\lceil {\ell /2} \rceil }\frac{1}{N^{2+(2s-2)\beta }}\sum _{i,j}a_{ij}^{(2s-1)}{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}\big (G_{ii}^sG_{jj}^s\big )({{\,\mathrm{Im}\,}}P)^{2n-1}\big ]\\&\quad +O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}\right) \\&=:-{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}({\underline{G}} \,^2) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ] +\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} \big [{{\,\mathrm{Im}\,}}{\mathcal {S}}(T^{(s)}) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ]\\&\quad +O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}\right) . \end{aligned}$$

Here we recall the definition of \({\mathcal {S}}(T)\) in (5.2), and observe that \(T^{(s)}\) above is the same as in (5.16). To handle the last relation, we introduce the following analogue of Lemmas 5.6 and 7.7.

Lemma 8.5

Let \(T \in {\mathcal {T}}_0\) with \(\nu _1(T)-\theta (T) \leqslant 0\). Fix \(r \in {\mathbb {N}}\), and let \({\mathcal {M}}(r,T)\) be as in Lemma 5.6. Then

$$\begin{aligned} {\mathbb {E}} [{{\,\mathrm{Im}\,}}{\mathcal {S}}(T)({{\,\mathrm{Im}\,}}P)^{2n-1}]= & {} {\mathbb {E}} [{{\,\mathrm{Im}\,}}{\mathcal {M}}(r, T)({{\,\mathrm{Im}\,}}P)^{2n-1}] +O_{\prec }\big (\big (\Theta +N^{-\beta (r+1)}\big ){\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1}\big ) \\&+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}\right) . \end{aligned}$$

Proof

The proof is similar to those of Lemmas 5.6 and 7.7. We use the identity

$$\begin{aligned} G_{ii}={\underline{G}} \,+G_{ii}{\underline{HG}} \,-(HG)_{ii}{\underline{G}} \, \end{aligned}$$

to replace the diagonal entries in \({\mathcal {S}}(T)\), and then expand the terms containing H using Lemma 2.1. We omit the details. \(\square \)

By Lemma 8.5, we have, for any \(s \in \{2,3,\ldots ,\lceil {\ell /2} \rceil \}\),

$$\begin{aligned} {\mathbb {E}} \big [{{\,\mathrm{Im}\,}}{\mathcal {S}}(T^{(s)}) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ]= & {} {\mathbb {E}} \big [{{\,\mathrm{Im}\,}}{\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)}) ({{\,\mathrm{Im}\,}}P)^{2n-1}\big ] \\&+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}\right) . \end{aligned}$$

Thus

$$\begin{aligned} \text {(VI)}= & {} -{\mathbb {E}}[ {{\,\mathrm{Im}\,}}({\underline{G}} \,^2) ({{\,\mathrm{Im}\,}}P)^{2n-1}] +\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} [ {{\,\mathrm{Im}\,}}{\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)}) ({{\,\mathrm{Im}\,}}P)^{2n-1}]\\&+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}\right) . \end{aligned}$$

From Lemma 8.5, we see that the term \({\mathcal {M}}(\lceil {\beta ^{-1}-2s+2} \rceil ,T^{(s)})\) above is the same as in (5.18), which implies

$$\begin{aligned} {\underline{G}} \,^2-\sum _{s=2}^{\lceil {\ell /2} \rceil }{\mathbb {E}} {\mathcal {M}}\big (\lceil {\beta ^{-1}} \rceil -2s+2,T^{(s)}\big )=Q_0({\underline{G}} \,). \end{aligned}$$

Thus

$$\begin{aligned} \text {(VI)}=-{\mathbb {E}} [ {{\,\mathrm{Im}\,}}Q_0 ({{\,\mathrm{Im}\,}}P)^{2n-1}]+O_{\prec }\left( \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r}\right) . \end{aligned}$$

In addition, note that

$$\begin{aligned} {\mathbb {E}} [ {{\,\mathrm{Im}\,}}({\mathcal {Z}} {\underline{G}} \,^2) ({{\,\mathrm{Im}\,}}P)^{2n-1}] \prec \frac{1}{N^{1/2+\beta }} {\mathbb {E}}[{{\,\mathrm{Im}\,}}{\underline{G}} \,\, |{{\,\mathrm{Im}\,}}P|^{2n-1}] \prec \frac{\Phi }{N^{1/2+\beta }} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-1}. \end{aligned}$$

From the definition of \(\mathrm {(V)}\) in (8.5), we conclude that

$$\begin{aligned} \text {(V)}+\text {(VI)} \prec \sum _{r=1}^{2n}{\mathcal {E}}_{{{\,\mathrm{Im}\,}}}^{r} {\mathcal {P}}_{{{\,\mathrm{Im}\,}}}^{2n-r} \end{aligned}$$

as desired. This concludes the proof of Lemma 8.3, and hence also that of Proposition 6.3.

9 Proof of Lemma 4.2

Convention

Throughout this section, z is given by (6.1), where w is deterministic and contained in

$$\begin{aligned} {{\mathbf {D}}}=\bigg \{w=\kappa +\mathrm {i}\eta \in {\mathbb {C}}_+:\eta \in [N^{-1+c},1], \kappa \in [-3,3], \eta +|\kappa |\geqslant \frac{1}{N^{1/2+\delta }q} \bigg \}\nonumber \\ \end{aligned}$$
(9.1)

where \(c>0\) is fixed.

The key in proving Lemma 4.2 is the following result.

Proposition 9.1

Suppose \(|{\underline{G}} \,-m| \prec \Psi \) for some deterministic \(\Psi \in [N^{-1},1]\). Then

$$\begin{aligned} P(z, {\underline{G}} \,) \prec \Big (\frac{1}{N\eta }+\frac{1}{\sqrt{N\eta }q^{3/2}}\Big )\big (\Psi +\sqrt{\eta +|\kappa |}\,\big ). \end{aligned}$$

uniformly for all \(w \in {{\mathbf {D}}}\).

The stability analysis of P was dealt for the region \({{\mathbf {Y}}}\) in Lemma 6.2, and one easily checks that the same result holds for the region \({{\mathbf {D}}}\). This leads to the next lemma.

Lemma 9.2

Lemma 6.2 holds provided that \({{\mathbf {Y}}}\) is replaced with \({{\mathbf {D}}}\).

Combining Proposition 9.1 and Lemma 9.2, we obtain the implication

$$\begin{aligned} |{\underline{G}} \,-m| \prec \Psi \implies |{\underline{G}} \,-m| \prec \Big (\frac{1}{N\eta }+\frac{1}{\sqrt{N\eta }q^{3/2}}\Big )^{1/2} \Psi ^{1/2}+ \frac{1}{N\eta }+\frac{1}{\sqrt{N\eta }q^{3/2}}, \end{aligned}$$

and thus

$$\begin{aligned} |{\underline{G}} \,-{m}| \prec \frac{1}{N\eta } +\frac{1}{(N\eta )^{1/2}q^{3/2}} \end{aligned}$$
(9.2)

uniformly for all \(w \in {{\mathbf {D}}}\). By the rigidity estimate (9.2), together with a standard analysis using Helffer-Sjöstrand formula (e.g. [12, Proposition 3.2]), one immediately concludes the proof of Lemma 4.2.

The rest of the section is devoted to the proof of Proposition 9.1. It is simpler than that of Proposition 6.1, and we only give a sketch. A detailed proof of a slightly weaker result can be found in [19, Proposition 2.9].

9.1 Proof of Proposition 9.1

Fix \(n \in {\mathbb {N}}_+\) and set

$$\begin{aligned}&{\mathcal {P}} :=\Vert P(z,{\underline{G}} \,)\Vert _{2n}=\Big ({\mathbb {E}} |P(z,{\underline{G}} \,)|^{2n}\Big )^{\frac{1}{2n}}, \\&{\mathcal {E}}_1 :=\Big (\frac{1}{N\eta }+\frac{1}{\sqrt{N\eta }q^{3/2}}\Big )\big (\Psi +\sqrt{\eta +|\kappa |}\,\big ). \end{aligned}$$

We shall show that

$$\begin{aligned} {\mathbb {E}} |P(z,{\underline{G}} \,)|^{2n}={\mathcal {P}}^{2n} \prec {\mathcal {E}}_1^{2n}, \end{aligned}$$
(9.3)

and Proposition 9.1 is obtained by Chebyshev’s inequality.

We shall see that the proof of (9.3) is much simpler than that of (7.1), as it does not require a secondary expansion as in Sect. 7.3.3. Recall the definitions of \(P'\), Q and \(Q_0\) from (7.2), and recall the definition of \(\Upsilon \) from (7.5). We have the bound

$$\begin{aligned} \Upsilon \prec {\mathcal {E}}_1. \end{aligned}$$
(9.4)

In addition, note that Lemma 7.1 remains true for \(w \in {{\mathbf {D}}}\).

Similarly to (7.11), we have

$$\begin{aligned} {\mathbb {E}} |P|^{2n}&={\mathbb {E}} Q_0 P^{n-1}P^{*n}+{\mathbb {E}}{\mathcal {Z}} {\underline{G}} \,^2P^{n-1}P^{*n} \nonumber \\&\quad +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^k G_{ij}}{\partial H_{ij}^k} P^{n-1}P^{*n}\bigg ] \nonumber \\&\quad +\frac{1}{N}\sum _{k=1}^\ell \frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s( P^{n-1}P^{*n})}{\partial H_{ij}^s} \frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}}\bigg ] \nonumber \\&\quad +O_{\prec }(N^{-4n}) \nonumber \\&=:\text {(VIII)}+\text {(IX)}+\text {(X)}+\text {(XI)} +O_{\prec }(N^{-4n}). \end{aligned}$$
(9.5)

The following result directly implies (9.3).

Lemma 9.3

We have

$$\begin{aligned} \mathrm {(IX)}+\mathrm {(XI)} \prec \sum _{r=1}^{2n}{\mathcal {E}}_1^r {\mathcal {P}}^{2n-r} \end{aligned}$$
(9.6)

as well as

$$\begin{aligned} \mathrm {(VIII)}+\mathrm {(X)} \prec \sum _{r=1}^{2n}{\mathcal {E}}_1^r {\mathcal {P}}^{2n-r}. \end{aligned}$$
(9.7)

We now sketch the proof of (9.6). The proof of (9.7) follows in a similar fashion. Let us first consider (XI). We write \(\text {(XI)}=\sum _{k=1}^lX^{(4)}_k\), where

$$\begin{aligned} X_k^{(4)}:=\frac{1}{N}\frac{1}{k!}\sum _{s=1}^k {k \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{k+1}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s (P^{n-1}P^{*n})}{\partial H_{ij}^s} \frac{\partial ^{k-s} G_{ij}}{\partial H_{ij}^{k-s}}\bigg ]. \end{aligned}$$

For \(k=1\), one can repeat the steps in Sect. 7.3.1 and show that

$$\begin{aligned} X_{1}^{(4)} \prec \Upsilon ^2 {\mathbb {E}} |P^{2n-2} | \prec \Upsilon ^2 {\mathcal {P}}^{2n-2}. \end{aligned}$$

Note that we have the bound (9.4), which implies

$$\begin{aligned} X_{1}^{(4)} \prec {\mathcal {E}}_1^2 {\mathcal {P}}^{2n-2}. \end{aligned}$$
(9.8)

For \(k=2\), one can follow the steps in Section 2 of [19], and show that \(X_2^{(4)} \prec \sum _{r=2}^{2n} \Upsilon ^{r} {\mathcal {P}}^{2n-r}\). Thus,

$$\begin{aligned} X_2^{(4)} \prec \sum _{r=2}^{2n} {\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$

A similar strategy works for all even \(k \geqslant 4\). This gives

$$\begin{aligned} \sum _{s=2}^{\lceil {\ell /2} \rceil }X^{(4)}_{2s} \prec \sum _{r=2}^{2n}{\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$
(9.9)

For \(k=3\), we split \(X_3^{(4)}=X_{3,1}^{(4)}+X_{3,2}^{(4)}+X_{3,3}^{(4)}\), where

$$\begin{aligned} X_{3,s}^{(4)}:=\frac{1}{N}\frac{1}{3!} {3 \atopwithdelims ()s}\sum _{i,j} {\mathcal {C}}_{4}(H_{ij}) {\mathbb {E}} \bigg [ \frac{\partial ^s (P^{n-1}P^{*n})}{\partial H_{ij}^s} \frac{\partial ^{3-s} G_{ij}}{\partial H_{ij}^{3-s}}\bigg ] \end{aligned}$$

for \(s=1,2,3\). Similarly to Step 1 of Sect. 7.3.3, we can show that

$$\begin{aligned} X_{3,1}^{(4)}+X_{3,3}^{(4)} \prec \frac{1}{N^{2\beta }} \sum _{t=0}^{2n-2} ((\Psi +\sqrt{|\kappa |+\eta })\Upsilon )^{t+1} \sqrt{\Upsilon } {\mathcal {P}}^{2n-2-t} \prec \sum _{r=2}^{2n} {\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}, \end{aligned}$$

where in the last step we used

$$\begin{aligned} \frac{1}{N^{2\beta }} ((\Psi +\sqrt{|\kappa |+\eta })\Upsilon )^{t+1} \sqrt{\Upsilon }\prec & {} {\mathcal {E}}_1^t \cdot \frac{1}{N^{2\beta }} ((\Psi +\sqrt{\kappa +\eta })\Upsilon ) \sqrt{\Upsilon } \\\prec & {} {\mathcal {E}}_1^t\cdot \frac{\Psi +\sqrt{|\kappa |+\eta }}{N\eta } \cdot \frac{\Psi +\sqrt{|\kappa |+\eta }}{\sqrt{N\eta }N^{2\beta }} \\\prec & {} {\mathcal {E}}_1^{2+t}. \end{aligned}$$

Now consider \(X_{3,2}^{(4)}\). Similarly to (7.26), we have

$$\begin{aligned} X^{(4)}_{3,2}&=-\frac{2n-2}{N^2}\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}[ P^{*n}P^{n-2}P'(G^2)_{ii}G_{ii}G^2_{jj}+P^{*n}P^{n-2}P'{\underline{G^2}} \,G_{ii}G_{jj}\\&\quad +P^{*n}P^{n-2}({\underline{G}} \,+{\underline{G}} \,^2)G_{ii}G_{jj}]\\&\quad -\frac{2n}{N^2}\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}[ |P|^{2n-2}{\overline{P}}'(G^{*2})_{ii}G_{ii}|G_{jj}|^{2}\\&\quad +|P|^{2n-2}{\overline{P}}'{\underline{G^{*2}}} \,G_{ii}G_{jj} +|P|^{2n-2}({\underline{G^*}} \,+{\underline{G^*}} \,^{2})G_{ii}G_{jj}] +O_{\prec }({\mathcal {E}}_1^2{\mathcal {P}}^{2n-2})\\&=:X_{3,2,1}^{(4)}+O_{\prec }({\mathcal {E}}_1^2{\mathcal {P}}^{2n-2}). \end{aligned}$$

Thus

$$\begin{aligned} X^{(4)}_3=X^{(4)}_{3,2,1}+\sum _{r=2}^{2n} O_{\prec }({\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}). \end{aligned}$$
(9.10)

A similar strategy works for all odd \(k \geqslant 5\). Note that (9.10) implies the bound

$$\begin{aligned} X_{3}^{(4)} \prec \frac{(\Psi +\sqrt{\eta +|\kappa |})^2}{N\eta q^2} {\mathcal {P}}^{2n-2}+\sum _{r=2}^{2n} {\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$

By Lemma 2.2, we see that, compared to \(X^{(4)}_3\), there will be additional factors of \(N^{-(k-2)\beta }\) in \(X^{(4)}_k\) for all \(k\geqslant 4\). Thus we can shown that

$$\begin{aligned} \sum _{s=2}^{\lceil {\ell /2} \rceil }X^{(4)}_{2s+1}\prec & {} \frac{1}{N^{2\beta }}\bigg (\frac{(\Psi +\sqrt{\eta +|\kappa |})^2}{N\eta q^2} {\mathcal {P}}^{2n-2} +\sum _{r=2}^{2n} {\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}\bigg ) \\\prec & {} \sum _{r=2}^{2n} {\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}. \end{aligned}$$

Using the above relation, together with (9.8)–(9.10), we get

$$\begin{aligned} \mathrm {(XI)}=X^{(4)}_{3,2,1}+\sum _{r=2}^{2n} O_{\prec }({\mathcal {E}}_1^{r} {\mathcal {P}}^{2n-r}). \end{aligned}$$
(9.11)

The computation of (IX) is similar, and we can show that

$$\begin{aligned} \mathrm {(IX)}&=\frac{2n-2}{N^2}\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}\big [ P^{*n}P^{n-2}P'(G^2)_{ii}G_{jj}{\underline{G}} \,^2\\&\quad +P^{*n}P^{n-2}P'{\underline{G^2}} \,{\underline{G}} \,^2+P^{*n}P^{n-2}({\underline{G}} \,+{\underline{G}} \,^2){\underline{G}} \,^2\big ]\\&\quad +\frac{2n}{N^2}\sum _{i,j} {\mathcal {C}}_4(H_{ij}) {\mathbb {E}}\big [ |P|^{2n-2}{\overline{P}}'(G^{*2})_{ii}G^*_{jj} {\underline{G}} \,^2 +|P|^{2n-2}{\overline{P}}'{\underline{G^{*2}}} \,\,{\underline{G}} \,^2\\&\quad +|P|^{2n-2}({\underline{G^*}} \,+{\underline{G^*}} \,^{2})G_{ii}G_{jj}\big ] +O_{\prec }({\mathcal {E}}_1^2{\mathcal {P}}^{2n-2}). \end{aligned}$$

By Proposition 2.5, we have

$$\begin{aligned} {\underline{G}} \,^2-G_{ii}G_{jj} \prec \frac{1}{q}+\frac{1}{\sqrt{N\eta }}. \end{aligned}$$

Thus, there is a cancellation between the leading order terms of (IX) and (XI), which implies

$$\begin{aligned} \mathrm {(IX)+(XI)} \prec \sum _{r=2}^{2n} \Upsilon ^{r} {\mathcal {P}}^{2n-r} \end{aligned}$$

as desired. This concludes the proof of Lemma 9.3, and also that of Lemma 4.2.

10 Proof of the improved estimates for abstract polynomials

In this section we repeatedly use the following identity.

Lemma 10.1

We have

$$\begin{aligned} G_{ij}=\delta _{ij}{\underline{G}} \,+G_{ij}{\underline{HG}} \,-(HG)_{ij}{\underline{G}} \,. \end{aligned}$$

Proof

The resolvent identity \((H-z)G=I\) shows

$$\begin{aligned} zG_{ij}{\underline{G}} \,=G_{ij}{\underline{HG}} \,-G_{ij}=(HG)_{ij}{\underline{G}} \,-\delta _{ij}{\underline{G}} \,, \end{aligned}$$

and from which the proof follows. \(\square \)

Let f(G) be a function of the entries of G. We compute \({\mathbb {E}} f(G)G_{ij}\) through

$$\begin{aligned} {\mathbb {E}} f(G)G_{ij}={\mathbb {E}} f(G)\delta _{ij}{\underline{G}} \,+{\mathbb {E}} f(G)G_{ij}{\underline{HG}} \,-{\mathbb {E}} f(G)(HG)_{ij}{\underline{G}} \,, \end{aligned}$$

and we shall see that the last two terms above cancel each other up to leading order, by Lemma 2.1. As a result, we can replace \({\mathbb {E}} f(G)G_{ij}\) by a slightly nicer quantity \({\mathbb {E}} f(G)\delta _{ij}{\underline{G}} \,\). This is the idea that we use throughout this section.

In each of the following subsections, the assumptions on z are given by the assumptions of the corresponding lemma being proved.

10.1 Proof of Lemma 5.3

As discussed in Remark 5.4, it suffices to look at the case \(\nu _2=1\).

Without loss of generality, let \(T_{i_1,\ldots ,i_{\nu _1}}=a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }G_{i_1i_2}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\), where \(x_2,\ldots ,x_\sigma \in \{i_1,\ldots ,i_{\nu _1}\}\), and \(a_{i_1,\ldots ,i_{\nu _1}}\) is uniformly bounded. Using Lemma 10.1 for \(i=i_1\) and \(j=i_2\), we have

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}} (T)&=\sum _{i_2,\ldots ,i_{\nu _1}} a_{i_2,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} {\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma } \nonumber \\&\quad +\sum _{i_1,\ldots ,i_{\nu _1},x,y} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta -1}{\mathbb {E}} H_{xy}G_{yx}G_{i_1i_2}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma } \nonumber \\&\quad -\sum _{i_1,\ldots ,i_{\nu _1},x} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} H_{i_1x}G_{xi_2}{\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }. \end{aligned}$$
(10.1)

By Lemma 2.1 and estimating the remainder term for large enough \(\ell \), the second last term in (10.1) becomes

$$\begin{aligned}&\sum _{k=1}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x,y} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta -1}\frac{1}{k!}{\mathcal {C}}_{k+1}(H_{xy}){\mathbb {E}} \frac{\partial ^k G_{yx}G_{i_1i_2}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }}{\partial H_{xy}^k}\\&\qquad +O_{\prec }\big (N^{\nu _1(T)-\theta (T)-1}\big ) \\&\quad =:\sum _{k=1}^\ell X^{(5)}_k+O_{\prec }\big (N^{\nu _1(T)-\theta (T)-1}\big ). \end{aligned}$$

Similarly, the last term in (10.1) becomes

$$\begin{aligned}&-\sum _{k=1}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }\frac{1}{k!}{\mathcal {C}}_{k+1}(H_{i_1x}){\mathbb {E}} \frac{\partial ^k G_{xi_2}{\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }}{\partial H_{i_1x}^k} \\&\qquad +O_{\prec }\big (N^{\nu _1(T)-\theta (T)-1}\big )\\&\quad =:\sum _{k=1}^\ell X^{(6)}_k+O_{\prec }\big (N^{\nu _1(T)-\theta (T)-1}\big ). \end{aligned}$$

Let us estimate each \(X_k^{(5)}\) and \(X^{(6)}_k\).

For \(k=1\), by \({\mathcal {C}}_2(H_{ij})=N^{-1}(1+O(\delta _{ij}))\) and Lemma 2.8 we have

$$\begin{aligned} X^{(5)}_1= & {} -\sum _{i_1,\ldots ,i_{\nu _1},x,y} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta -2}{\mathbb {E}} G_{xx}G_{yy}G_{i_1i_2}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\\&+O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}} \Gamma +N^{-1})) \end{aligned}$$

and

$$\begin{aligned} X^{(6)}_1= & {} \sum _{i_1,\ldots ,i_{\nu _1},x} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta -1}{\mathbb {E}} G_{xx}{\underline{G}} \,G_{i_1i_2}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\\&+O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}} \Gamma +N^{-1})). \end{aligned}$$

Notice the cancellation between the above two equations. This gives

$$\begin{aligned} X^{(5)}_1+X^{(6)}_1=O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}} \Gamma +N^{-1})). \end{aligned}$$
(10.2)

For \(k=2\), the most dangerous type of term in \(X_2^{(6)}\) contains only one off-diagonal entry of G, e.g.

$$\begin{aligned} -\sum _{i_1,\ldots ,i_{\nu _1},x} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }{\mathcal {C}}_{3}(H_{i_1x}){\mathbb {E}} G_{xi_2} G_{i_1i_1}G_{xx}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }=:X^{(6)}_{2,1}.\quad \quad \end{aligned}$$
(10.3)

Note that (10.3) can be written as \({\mathbb {E}}{\mathcal {S}}(T')\), where \(T' \in {\mathcal {T}}\) and \(\nu _1(T') = \nu _1(T) + 1\), \(\theta (T') = \theta (t) + 1 + \beta \), and \(\sigma (T')=\sigma (T)+1\). When a term in \(X_2^{(6)}\) contains at least two off-diagonal entries of G, one can use Lemma 2.8 to show that it is bounded by \(O_{\prec }(N^{\theta -\nu _1}{\mathbb {E}} \Gamma )\). A similar argument works for all \(X_k^{(5)}\) and \(X^{(6)}_k\) when \(k \geqslant 2\).

To sum up, we have

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T)=\sum _{l=1}^m {\mathbb {E}} \,{\mathcal {S}}(T^{(l)})+O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}}\Gamma +N^{-1})) \end{aligned}$$
(10.4)

for some fixed integer m. Each \(T^{(l)}\) satisfies \(\nu _1(T^{(l)})=\nu _1(T)+1\), \(\sigma (T^{(l)}) \geqslant \sigma (T)+1\), \(\theta (T^{(l)})=\theta (T)+1+\beta (\sigma (T^{(l)})-\sigma (T))\) and \(\nu _2(T^{(l)})=1\), which implies

$$\begin{aligned} {\mathbb {E}}\, {\mathcal {S}}(T^{(l)})\prec N^{\nu _1(T)-\theta (T)-\beta (\sigma (T^{(l)})-\sigma (T))}\prec N^{\nu _1(T)-\theta (T)-\beta }. \end{aligned}$$

Note that we can repeat (10.4) for each \({\mathbb {E}}\, {\mathcal {S}}(T^{(l)})\), and get

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T^{(l)})=\sum _{l'=1}^{m'} {\mathbb {E}} \,{\mathcal {S}}(T^{(l,l')})+O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}}\Gamma +N^{-1})), \end{aligned}$$

and each \({\mathbb {E}} \,{\mathcal {S}}(T^{(l,l')})\) satisfies \( {\mathbb {E}}\, {\mathcal {S}}(T^{(l,l)})\prec N^{\nu _1(T)-\theta (T)-2\beta }. \) Repeating the step (10.4) \(\lceil {\beta ^{-1}} \rceil \) times concludes the proof of Lemma 5.3.

10.2 Proof of Lemma 5.5

Let \(T_{i_1,\ldots ,i_{\nu _1}}=a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }G_{x_1x_1}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\), where \(x_1,\ldots ,x_\sigma \in \{i_1,\ldots ,i_{\nu _1}\}\), and \(a_{i_1,\ldots ,i_{\nu _1}}\) is uniformly bounded. Using Lemma 10.1 we have

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}} (T)&=\sum _{i_1,\ldots ,i_{\nu _1}} a_{i_2,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} {\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\\&\quad -\sum _{i_1,\ldots ,i_{\nu _1},x} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} H_{x_1x}G_{xx_1}{\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\\&\quad +\sum _{i_1,\ldots ,i_{\nu _1},x,y} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta -1}{\mathbb {E}} H_{xy}G_{yx}G_{x_1x_1}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }. \end{aligned}$$

Now let us expand the last two terms by Lemma 2.1. As in Sect. 10.1, we shall see a cancellation among the leading terms, which gives

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}} (T)&=\sum _{i_1,\ldots ,i_{\nu _1}} a_{i_2,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} {\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }\nonumber \\&\quad -\sum _{k=2}^{\ell }\sum _{i_1,\ldots ,i_{\nu _1},x} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }\frac{1}{k!}{\mathcal {C}}_{k+1}(H_{x_1x}){\mathbb {E}} \frac{\partial ^k G_{xx_1}{\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }}{\partial H_{x_1x}^k} \nonumber \\&\quad +\sum _{k=2}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x,y} a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta -1}\frac{1}{k!}{\mathcal {C}}_{k+1}(H_{xy}){\mathbb {E}} \frac{\partial ^k G_{yx}G_{x_1x_1}G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }}{\partial H_{xy}^k} \nonumber \\&\quad +O_{\prec }\big (N^{\nu _1(T)-\theta (T)}({\mathbb {E}} \Gamma +N^{-1})\big ). \end{aligned}$$
(10.5)

For the terms on right-hand side of (10.5) that are not in \({\mathcal {T}}_0\), we can use Lemma 5.3 and show that they are bounded by \(O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}}\Gamma +N^{-1}))\). As a result, we find

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(T)=&\sum _{i_1,\ldots ,i_{\nu _1}} a_{i_2,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} {\underline{G}} \,G_{x_2x_2}\ldots G_{x_kx_k}\\&+\sum _{l=1}^m {\mathbb {E}} \,{\mathcal {S}}(T^{(l)})+O_{\prec }(N^{\nu _1(T)-\theta (T)}({\mathbb {E}}\Gamma +N^{-1})) \end{aligned} \end{aligned}$$
(10.6)

for some fixed integer m. Each \(T^{(l)}\) satisfies \(T^{(l)} \in {\mathcal {T}}_0\), \(\nu _1(T^{(l)})=\nu _1(T)+1\), \(\sigma (T^{(l)})- \sigma (T)\in 2{\mathbb {N}}+4\), and \(\theta (T^{(l)})=\theta (T)+1+\beta (\sigma (T^{(l)})-\sigma (T)-2)\). We can then repeat (10.6) on the term

$$\begin{aligned} \sum _{i_1,\ldots ,i_{\nu _1}} a_{i_2,\ldots ,i_{\nu _1}}N^{-\theta }{\mathbb {E}} {\underline{G}} \,G_{x_2x_2}\ldots G_{x_\sigma x_\sigma }. \end{aligned}$$

After \(k-1\) times of repetition we get the desired result. This concludes the proof of Lemma 5.5.

10.3 Proof of Lemma 7.4

(i) Let V be of the form (7.16). By Lemma 2.8, we see that the result is trivially true for \(\nu _2\geqslant 2\), and hence we assume \(\nu _2=1\). Define

$$\begin{aligned} \begin{aligned} {\mathcal {E}}_2(V)&:=N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta })^{\nu _4} (N\eta )^{-\nu _5}\Upsilon {\mathbb {E}} |P^{\nu _3}| \\&\quad + \sum _{t=1}^{\nu _3} N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta })^{\nu _4} \Upsilon ^{\nu _5}((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{\nu _3-t}|. \end{aligned} \end{aligned}$$

By the definition of \(\nu _2\), we consider two cases.

Case 1 The contribution of \(\nu _2\) comes from \(G_{x_1y_1}G_{x_2y_2}\ldots G_{x_ky_k}\). Without loss of generality, we assume \(x_1 \ne y_1\), and \(x_1=i_1,y_1=i_2\). Furthermore, we denote \({\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}=V_{i_1,\ldots ,i_{\nu _1}}/G_{i_1i_2}\). From Lemma 10.1 we know that

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V)= & {} \sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}\delta _{i_1i_2} {\underline{G}} \,+\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_2,\ldots ,i_{\nu _1}} G_{i_1i_2}{\underline{HG}} \,\nonumber \\&-\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_2,\ldots ,i_{\nu _1}} (HG)_{i_1i_2}{\underline{G}} \,. \end{aligned}$$
(10.7)

By Lemma 2.1 and estimating the remainder term for large enough \(\ell \), the second last term in (10.7) becomes

$$\begin{aligned}&\sum _{k=1}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x,y} N^{-1}\frac{1}{k!}{\mathcal {C}}_{k+1}(H_{xy}){\mathbb {E}} \frac{\partial ^k {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}G_{i_1i_2}G_{yx}}{\partial H_{xy}^k} \nonumber \\&\quad +O_{\prec }(N^{\nu _1(V)-\theta (V)-2}{\mathbb {E}} |P^{\nu _3}|), \end{aligned}$$
(10.8)

and we denote the first sum by \(\sum _{k=1}^\ell X^{(7)}_k\). Similarly, the last term in (10.7) becomes

$$\begin{aligned}&-\sum _{k=1}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x} \frac{1}{k!}{\mathcal {C}}_{k+1}(H_{i_1x}){\mathbb {E}} \frac{\partial ^k {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}G_{xi_2}{\underline{G}} \,}{\partial H_{i_1x}^k} \nonumber \\&\quad +O_{\prec }(N^{\nu _1(V)-\theta (V)-2}{\mathbb {E}} |P^{\nu _3}|), \end{aligned}$$
(10.9)

and we denote the first sum by \(\sum _{k=1}^\ell X^{(8)}_k\). Similarly to Sect. 10.1, we see that when expanded by Lemma 2.1, the leading terms of \(X_1^{(7)}\) and \(X_1^{(8)}\) cancel, and together with Lemma 7.1 we can show that

$$\begin{aligned} X_1^{(7)}+X_1^{(8)} \prec {\mathcal {E}}_2(V). \end{aligned}$$

For \(k=2\), the most dangerous type of term in \(X_2^{(8)}\) contains \(\nu _3\) factors of P, and only one off-diagonal entry of G or \(G^2\), e.g.

$$\begin{aligned} -\sum _{i_1,\ldots ,i_{\nu _1},x}{\mathcal {C}}_{3}(H_{i_1x}){\mathbb {E}} {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}{G_{xx}}G_{i_1i_1}G_{xi_2}{\underline{G}} \,=:X^{(8)}_{2,1}. \end{aligned}$$

Note that this term can be written as \({\mathbb {E}}{\mathcal {S}}(V')\), where \(V' \in {\mathcal {V}}\), \(\nu _1(V') = \nu _1(V) + 1\), \(\theta (V') = \theta (V) + 1 + \beta \), \(\sigma (V')=\sigma (V)+1\), and \(\nu _i(V')=\nu _i(V)\) for \(i=2,3,4,5\). When a term in \(X_2^{(8)}\) contains at least two factors of off-diagonal entries of G or \(G^2\), or the differential \(\partial ^2/\partial H^2_{i_1x}\) hits \(P^{\nu _3}\), one can easily use Lemma 7.1 to show that it is bounded by \(O_{\prec }(N^{\nu _1(V)-\theta (V)-2}{\mathbb {E}} |P^{\nu _3}|)\). A similar argument works for all \(X_k^{(5)}\) and \(X^{(6)}_k\) when \(k \geqslant 2\).

To sum up, we have

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(V)=\sum _{l=1}^m {\mathbb {E}} \,{\mathcal {S}}(V^{(l)})+O_{\prec }({\mathcal {E}}_2(V)) \end{aligned}$$
(10.10)

for some fixed integer m. Each \(V^{(l)}\) satisfies \(\nu _1(V^{(l)})=\nu _1(V)+1\), \(\sigma (V^{(l)}) \geqslant \sigma (V)+1\), \(\theta (V^{(l)})=\theta (V)+1+\beta (\sigma (V^{(l)})-\sigma (V))\), and \(\nu _i(V^{(l)})=\nu _i(V)\) for \(i=2,3,4,5\). Thus Lemma 7.1 implies

$$\begin{aligned} {\mathbb {E}}\, {\mathcal {S}}(V^{(l)})\prec N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta })^{\nu _4} (N\eta )^{-\nu _5}\Upsilon ^{1/2}{\mathbb {E}} |P^{\nu _3}|\cdot N^{-\beta }. \end{aligned}$$

Note that we can repeat (10.10) for each \({\mathbb {E}} \,{\mathcal {S}}(V^{(l)})\) on right-hand side of (10.10). Doing this \(\lceil {(2\beta )^{-1}} \rceil \) times concludes the proof.

Case 2 The contribution to \(\nu _2\) comes from \(N^{-1}(G^2)_{xy}\). Without loss of generality, we assume \(x=i_1,y=i_2\), and we denote \({\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}=V_{i_1,\ldots ,i_{\nu _1}}/(G^2)_{i_1i_2}\). Note that

$$\begin{aligned} (G^2)_{ij}=G_{ij}{\underline{G}} \,+(G^2)_{ij}{\underline{HG}} \,-(HG^2)_{ij}{\underline{G}} \,, \end{aligned}$$

and hence

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V)&=\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}G_{i_1i_2} {\underline{G}} \,+\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widetilde{V}}_{i_2,\ldots ,i_{\nu _1}} (G^2)_{i_1i_2}{\underline{HG}} \, \nonumber \\&\quad -\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widetilde{V}}_{i_2,\ldots ,i_{\nu _1}} (HG^2)_{i_1i_2}{\underline{G}} \, \nonumber \\&=\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widetilde{V}}_{i_2,\ldots ,i_{\nu _1}} (G^2)_{i_1i_2}{\underline{HG}} \, \nonumber \\&\quad -\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widetilde{V}}_{i_2,\ldots ,i_{\nu _1}} (HG^2)_{i_1i_2}{\underline{G}} \,+O_{\prec }({\mathcal {E}}_2(V)). \end{aligned}$$
(10.11)

We can then expand the first two terms on right-hand side of (10.11) using Lemma 2.1. The first term on right-hand side of (10.11) gives

$$\begin{aligned}&\sum _{k=1}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x,y} N^{-1}\frac{1}{k!}{\mathcal {C}}_{k+1}(H_{xy}){\mathbb {E}} \frac{\partial ^k {\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}(G^2)_{i_1i_2}G_{yx}}{\partial H_{xy}^k} \nonumber \\&\quad +O_{\prec }(N^{\nu _1(V)-\theta (V)-2}{\mathbb {E}} |P^{\nu _3}|), \end{aligned}$$
(10.12)

and we abbreviate the first sum above by \( \sum _{k=1}^{\ell }X_{k}^{(9)}\). The second term on right-hand side of (10.11) gives

$$\begin{aligned} -\sum _{k=1}^\ell \sum _{i_1,\ldots ,i_{\nu _1},x} \frac{1}{k!}{\mathcal {C}}_{k+1}(H_{i_1x}){\mathbb {E}} \frac{\partial ^k {\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}(G^2)_{xi_2}{\underline{G}} \,}{\partial H_{i_1x}^k} +O_{\prec }(N^{\nu _1(V)-\theta (V)-2}{\mathbb {E}} |P^{\nu _3}|), \end{aligned}$$

and we abbreviate the first sum above by \( \sum _{k=1}^{\ell }X_{k}^{(10)}\). By (7.4), we see that

$$\begin{aligned} X_1^{(9)}= -\sum _{i_1,\ldots ,i_{\nu _1},x,y} N^{-2}{\mathbb {E}} {\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}(G^2)_{i_1i_2}G_{xx}G_{yy}+O_{\prec }({\mathcal {E}}_2(V)), \end{aligned}$$

and

$$\begin{aligned} X_1^{(10)}= \sum _{i_1,\ldots ,i_{\nu _1},x} N^{-1}{\mathbb {E}} {\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}\big ((G^2)_{i_1i_2}G_{xx}+(G^2)_{xx}G_{i_1i_2}\big ){\underline{G}} \,+O_{\prec }({\mathcal {E}}_2(V)). \end{aligned}$$

Thus there is a cancellation between \( X_1^{(9)}\) and \( X_1^{(10)}\), which shows

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V)=\sum _{i_1,\ldots ,i_{\nu _1}} {\mathbb {E}} {\widetilde{V}}_{i_1,\ldots ,i_{\nu _1}}{\underline{G^2}} \,G_{i_1i_2}{\underline{G}} \, +\sum _{k=2}^{\ell }X_{k}^{(9)} +\sum _{k=2}^{\ell }X_{k}^{(10)} +O_{\prec }({\mathcal {E}}_2(V)).\nonumber \\ \end{aligned}$$
(10.13)

The first term on right-hand side of (10.13) is the leading term, and it no longer contains \((G^2)_{i_1i_2}\). The rest of the proof is analogues to Case 1. We omit the details.

(ii) Let \(V \in {\mathcal {V}}\) satisfy \(\nu _2(V)\ne 0\) and \(\nu _4(V)=\nu _5(V)=0\). From the result in (i), we have the bound

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}} (V)\prec & {} N^{\nu _1(V)-\theta (V)} \Upsilon {\mathbb {E}} |P^{\nu _3(V)}| \\&+\sum _{t=1}^{\nu _3(V)} N^{\nu _1(V)-\theta (V)}((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{\nu _3(V)-t}|, \end{aligned}$$

so that we only need to improve the bound for the term \(t=1\). Once again it suffices to assume \(\nu _2(V)=1\), and \(x_1=i_1\), \(y_1=i_2\). We denote \({\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}=V_{i_1,\ldots ,i_{\nu _1}}/G_{i_1i_2}\). As in (10.7)–(10.9), we have

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V)= & {} \sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}\delta _{i_1i_2} {\underline{G}} \,+\sum _{k=1}^\ell X^{(7)}_k \\&+\sum _{k=1}^\ell X^{(8)}_k+O_{\prec }(N^{\nu _1(V)-\theta (V)-2}{\mathbb {E}} |P^{\nu _3}|). \end{aligned}$$

Let us pick a term \({\mathcal {X}}\) in \(\sum _{k=1}^\ell X^{(7)}_k+\sum _{k=1}^\ell X^{(8)}_k\), which, we recall, are given by the sums in (10.8) and (10.9). When \(\nu _3({\mathcal {X}}) \ne \nu _3(V)-1\), we handle this term as in the proof of (i). When \(\nu _3({\mathcal {X}})=\nu _3(V)-1\), then from (7.4), we must have \(\nu _4({\mathcal {X}})=\nu _5({\mathcal {X}})=1\). Thus from (i), we have

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}({\mathcal {X}})&\prec {\mathcal {E}}_2({\mathcal {X}})\\&\prec N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta }) (N\eta )^{-1}\Upsilon {\mathbb {E}} |P^{\nu _3-1}| \\&\quad + \sum _{t=1}^{\nu _3-1} N^{\nu _1-\theta }(\Psi +\sqrt{\kappa +\eta }) \Upsilon ((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{\nu _3-1-t}|\\&\prec N^{\nu _1-\theta }\Upsilon ^{2}{\mathbb {E}} |P^{\nu _3-1}|+\sum _{t=2}^{\nu _3} N^{\nu _1-\theta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{t}{\mathbb {E}} |P^{\nu _3-t}| \end{aligned}$$

as desired. This concludes the proof of Lemma 7.4.

10.4 Proof of Lemma 7.5

Let \(V=a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }(P'N^{-1}(G^2)_{xx})^{\nu _4}G_{x_1x_1}G_{x_2x_2}\ldots G_{x_{k}x_{k}}{\underline{G}} \,^sP^{\nu _3}\). We abbreviate \({\widehat{V}}_{i_1,\ldots ,i_{\nu _1}}:=V_{i_1,\ldots ,i_{\nu _1}}/G_{x_1x_1}\), and denote

$$\begin{aligned} \begin{aligned} {\mathcal {E}}_3(V):=N^{\nu _1-\theta }\Upsilon ^{1+\nu _4}{\mathbb {E}} |P^{\nu _3}| +\sum _{t=1}^{\nu _3} N^{\nu _1-\theta }((\Psi +\sqrt{\kappa +\eta }) \Upsilon )^{\nu _4+t}{\mathbb {E}} |P^{\nu _3-t}|. \end{aligned} \end{aligned}$$

Using Lemma 10.1 we have

$$\begin{aligned} {\mathbb {E}} {\mathcal {S}}(V)= & {} \sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}} {\underline{G}} \,+\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_2,\ldots ,i_{\nu _1}} G_{x_1x_1}{\underline{HG}} \,\\&-\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_2,\ldots ,i_{\nu _1}} (HG)_{x_1x_1}{\underline{G}} \,. \end{aligned}$$

Now let us expand the last two terms by Lemma 2.1. As in Sect. 10.1, we shall see a cancellation among the leading terms. For other terms that are not in \({\mathcal {T}}_0\), we can use Lemma 5.3 and show that they are bounded by \(O_{\prec }({\mathcal {E}}_3(V))\). As a result, we can show that

$$\begin{aligned} {\mathbb {E}} \,{\mathcal {S}}(V)=\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}} {\underline{G}} \,+\sum _{l=1}^m {\mathbb {E}} \,{\mathcal {S}}(V^{(l)})+O_{\prec }({\mathcal {E}}_3(V)) \end{aligned}$$
(10.14)

for some fixed integer m. Each \(V^{(l)}\) satisfies \(V^{(l)} \in {\mathcal {V}}_0\), \(\nu _1(V^{(l)})=\nu _1(V)+1\), \(\sigma (V^{(l)})- \sigma (V)\in 2{\mathbb {N}}+4\), \(\theta (V^{(l)})=\theta (V)+1+\beta (\sigma (V^{(l)})-\sigma (V)-2)\), and \(\nu _i(V^{(l)})=\nu _i(V)\) for \(i=2,3,4,5\). One can then repeat (10.14) process on the term

$$\begin{aligned} \sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} {\widehat{V}}_{i_1,\ldots ,i_{\nu _1}} {\underline{G}} \,. \end{aligned}$$

After k times of repetition we conclude the proof of Lemma 7.5.

10.5 Proof of Lemma 7.7

The proof follows by repeatedly using the following result.

Lemma 10.2

Fix \(r,u,v\in {\mathbb {N}}\). For any fixed \(T \in {\mathcal {T}}_0\) there exists \(T^{(1)},\ldots ,T^{(k)} \in {\mathcal {T}}_0\), such that

$$\begin{aligned} {\mathbb {E}} [\partial _{w}({\mathcal {S}}(T)){\underline{G}} \,^u P^v]&={\mathbb {E}} [\partial _{w}({\mathcal {M}}(T)){\underline{G}} \,^u P^v]+\sum _{l=1}^k{\mathbb {E}}\, [\partial _{w}{\mathcal {S}}(T^{(l)}){\underline{G}} \,^uP^{v}] \nonumber \\&\quad +O_{\prec }\big (N^{\nu _1(T)-\theta (T)+1}\Upsilon ((N\eta )^{-1}+N^{-\beta (r+1)}){\mathbb {E}} |P|^v\big ) \nonumber \\&\quad +\sum _{t=1}^v O_{\prec }\big (N^{\nu _1(T)-\theta (T)+1}\Upsilon ((\Psi +\sqrt{\kappa +\eta })\Upsilon )^t{\mathbb {E}} |P|^{v-t}\big ), \end{aligned}$$
(10.15)

where k is fixed. Each \(T^{(l)}\) satisfies \(\sigma (T^{(l)})-\sigma (T) \in 2{\mathbb {N}}+4\),

$$\begin{aligned} \nu _1(T^{(l)})=\nu _1(T)+1, \quad \text {and} \quad \theta (T^{(l)})=\theta (T)+1+\beta (\sigma (T^{(l)})-\sigma (T)-2). \end{aligned}$$

Proof of Lemma 10.2

We abbreviate the error, i.e. the last two terms on right-hand side of (10.15), by \({\mathcal {E}}_4\equiv {\mathcal {E}}_4(T,u,v)\).

Let \(T_{i_1,\ldots ,i_{\nu _1}}=a_{i_1,\ldots ,i_{\nu _1}}N^{-\theta }G_{x_1x_1}G_{x_2x_2}\ldots G_{x_kx_k}\), where \(x_1,\ldots ,x_k \in \{i_1,\ldots ,i_{\nu _1}\}\) and \(a_{i_1,\ldots ,i_{\nu _1}}\) is uniformly bounded. We abbreviate \({\widehat{T}}_{i_1,\ldots ,i_{\nu _1}}:=T_{i_1,\ldots ,i_{\nu _1}}/G_{x_1x_1}\). By Lemma 10.1, we have

$$\begin{aligned} {\mathbb {E}} [\partial _{w}({\mathcal {S}}(T)){\underline{G}} \,^u P^v]&=\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} [\partial _{w}({\widehat{T}}_{i_1,\ldots ,i_{\nu _1}}{\underline{G}} \,){\underline{G}} \,^u P^v] \nonumber \\&\quad +\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} [\partial _{w}({\widehat{T}}_{i_1,\ldots ,i_{\nu _1}}G_{x_1x_1}{\underline{HG}} \,){\underline{G}} \,^u P^v] \nonumber \\&\quad -\sum _{i_1,\ldots ,i_{\nu _1}}{\mathbb {E}} [\partial _{w}({\widehat{T}}_{i_1,\ldots ,i_{\nu _1}}(HG)_{x_1x_1}{\underline{G}} \,){\underline{G}} \,^u P^v]. \end{aligned}$$
(10.16)

Now we expand the last two terms in (10.16) using Lemma 2.1. Note that we have

$$\begin{aligned} \partial _{w}(H_{ij})=0 \quad \text {and} \quad \bigg [\partial _{w} ,\frac{\partial }{\partial H_{ij}}\bigg ]=0. \end{aligned}$$

The rest of the proof is analogous to that of Lemma 5.3. We omit the details. \(\square \)