Skip to main content
Log in

Moment Bounds for Large Autocovariance Matrices Under Dependence

  • Published:
Journal of Theoretical Probability Aims and scope Submit manuscript

Abstract

The goal of this paper is to obtain expectation bounds for the deviation of large sample autocovariance matrices from their means under weak data dependence. While the accuracy of covariance matrix estimation corresponding to independent data has been well understood, much less is known in the case of dependent data. We make a step toward filling this gap and establish deviation bounds that depend only on the parameters controlling the “intrinsic dimension” of the data up to some logarithmic terms. Our results have immediate impacts on high-dimensional time-series analysis, and we apply them to high-dimensional linear VAR(d) model, vector-valued ARCH model, and a model used in Banna et al. (Random Matrices Theory Appl 5(2):1650006, 2016).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Andrews, D.W.: Non-strong mixing autoregressive processes. J. Appl. Probab. 21(4), 930–934 (1984)

    MathSciNet  MATH  Google Scholar 

  2. Bai, Z., Yin, Y.: Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Probab. 21(3), 1275–1294 (1993)

    MathSciNet  MATH  Google Scholar 

  3. Banna, M., Merlevède, F., Youssef, P.: Bernstein-type inequality for a class of dependent random matrices. Random Matrices Theory Appl. 5(2), 1650006 (2016)

    MathSciNet  MATH  Google Scholar 

  4. Berbee, H.C.: Random Walks with Stationary Increments and Renewal Theory, vol. 112. Mathematisch Centrum, Amsterdam (1979)

    MATH  Google Scholar 

  5. Blinn, J.: Consider the lowly \(2 \times 2\) matrix. IEEE Comput. Graph. Appl. 16(2), 82–88 (1996)

    Google Scholar 

  6. Brand, M.: Fast low-rank modifications of the thin singular value decomposition. Linear Algebra Appl. 415(1), 20–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Brillinger, D.R.: Time Series: Data Analysis and Theory. Siam, Philadelphia (2001)

    MATH  Google Scholar 

  8. Bunea, F., Xiao, L.: On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. Bernoulli 21(2), 1200–1230 (2015)

    MathSciNet  MATH  Google Scholar 

  9. Chang, J., Guo, B., Yao, Q.: Principal component analysis for second-order stationary vector time series. Ann. Stat. 46(5), 2094–2124 (2018)

    MathSciNet  MATH  Google Scholar 

  10. Chen, X., Xu, M., Wu, W.B.: Covariance and precision matrix estimation for high-dimensional time series. Ann. Stat. 41(6), 2994–3021 (2013)

    MathSciNet  MATH  Google Scholar 

  11. Davis, C., Kahan, W.M.: The rotation of eigenvectors by a perturbation. iii. SIAM J. Numer. Anal. 7(1), 1–46 (1970)

    MathSciNet  MATH  Google Scholar 

  12. Dedecker, J., Doukhan, P., Lang, G., Leon, J., Louhichi, S., Prieur, C.: Weak Dependence: With Examples and Applications. Springer, New York (2007)

    MATH  Google Scholar 

  13. Dedecker, J., Prieur, C.: Coupling for \(\tau \)-dependent sequences and applications. J. Theor. Probab. 17(4), 861–885 (2004)

    MathSciNet  MATH  Google Scholar 

  14. Han, F., Liu, H.: ECA: high-dimensional elliptical component analysis in non-gaussian distributionsigh-dimensional elliptical component analysis in non-Gaussian distributions. J. Am. Stat. Assoc. 113(521), 252–268 (2018)

    MATH  Google Scholar 

  15. Koltchinskii, V., Lounici, K.: Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23(1), 110–133 (2017a)

    MathSciNet  MATH  Google Scholar 

  16. Koltchinskii, V., Lounici, K.: New asymptotic results in principal component analysis. Sankhya A 79(2), 254–297 (2017b)

    MathSciNet  MATH  Google Scholar 

  17. Koltchinskii, V., Lounici, K.: Normal approximation and concentration of spectral projectors of sample covariance. Ann. Stat. 45(1), 121–157 (2017c)

    MathSciNet  MATH  Google Scholar 

  18. Liu, W., Xiao, H., Wu, W.B.: Probability and moment inequalities under dependence. Stat. Sin. 23(3), 1257–1272 (2013)

    MathSciNet  MATH  Google Scholar 

  19. Lounici, K.: High-dimensional covariance matrix estimation with missing observations. Bernoulli 20(3), 1029–1058 (2014)

    MathSciNet  MATH  Google Scholar 

  20. Mendelson, S.: Empirical processes with a bounded \(\psi _1\) diameter. Geom. Funct. Anal. 20(4), 988–1027 (2010)

    MathSciNet  MATH  Google Scholar 

  21. Mendelson, S., Paouris, G.: On the singular values of random matrices. J. Eur. Math. Soc. 16, 823–834 (2014)

    MathSciNet  MATH  Google Scholar 

  22. Merlevède, F., Peligrad, M., Rio, E.: Bernstein inequality and moderate deviations under strong mixing conditions. High Dimensional Probability V: The Luminy Volume, pp. 273–292. Institute of Mathematical Statistics, Beachwood (2009)

    Google Scholar 

  23. Merlevède, F., Peligrad, M., Rio, E.: A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Relat. Fields 151(3), 435–474 (2011)

    MathSciNet  MATH  Google Scholar 

  24. Oliveira, R.: Sums of random Hermitian matrices and an inequality by Rudelson. Electron. Commun. Probab. 15, 203–212 (2010)

    MathSciNet  MATH  Google Scholar 

  25. Petz, D.: A survey of certain trace inequalities. Banach Center Publ. 30(1), 287–298 (1994)

    MathSciNet  MATH  Google Scholar 

  26. Rudelson, M.: Random vectors in the isotropic position. J. Funct. Anal. 164(1), 60–72 (1999)

    MathSciNet  MATH  Google Scholar 

  27. Slepian, D.: The one-sided barrier problem for Gaussian noise. Bell Syst. Techn. J. 41(2), 463–501 (1962)

    MathSciNet  Google Scholar 

  28. Srivastava, N., Vershynin, R.: Covariance estimation for distributions with \(2+\epsilon \) moments. Ann. Probab. 41(5), 3081–3111 (2013)

    MathSciNet  MATH  Google Scholar 

  29. Talagrand, M.: Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems. Springer, Berlin (2014)

    MATH  Google Scholar 

  30. Tikhomirov, K.: Sample covariance matrices of heavy-tailed distributions. Int. Math. Res. Not. 2018(20), 6254–6289 (2017)

    MathSciNet  MATH  Google Scholar 

  31. Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)

    MATH  Google Scholar 

  32. van Handel, R.: Structured random matrices. Convexity and Concentration, vol. 161, pp. 107–156. Springer, Berlin (2017)

    Google Scholar 

  33. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. Compressed Sensing, pp. 210–268. Cambridge University Press, Cambridge (2012)

    Google Scholar 

  34. Wu, W.B.: Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci. U. S. A. 102(40), 14150–14154 (2005)

    MathSciNet  MATH  Google Scholar 

  35. Wu, W.B., Wu, Y.N.: Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Stat. 10(1), 352–379 (2016)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A Proof of Theorem 4.3

In this “Appendix,” we present the proof of Theorem 4.3, which slightly extends the Bernstein-type inequality proven by Banna et al. [3] in which the random matrix sequence is assumed to be \(\beta \)-mixing. The proof is largely identical to theirs, and we include it here mainly for completeness.

In the following, \(\tau _k\) is abbreviation of \(\tau (k)\) for \(k \ge 1\). If a matrix \(\mathbf {X}\) is positive semi-definite, denote it as \(\mathbf {X}\succeq 0\). For any \(x>0\), we define \(h(x) = x^{-2}(e^x - x - 1)\). Denote the floor, ceiling, and integer parts of a real number x by \(\lfloor x\rfloor \), \(\lceil x \rceil \), and [x]. For any two real numbers ab, denote \(a\vee b := \max \{a,b\}\). Denote the exponential of matrix \(\mathbf {X}\) as \(\exp (\mathbf {X}) = \mathbf {I}_p + \sum _{q = 1}^{\infty }\mathbf {X}^q/q!\). Letting \(\sigma _1\) and \(\sigma _2\) be two sigma fields, denote \(\sigma _1 \vee \sigma _2\) to be the smallest sigma field that contains \(\sigma _1\) and \(\sigma _2\) as sub-sigma fields.

A roadmap of this “Appendix” is as follows. Section A.1 formally introduces the concept of \(\tau \)-mixing coefficient. Section A.2 previews the proof of Theorem 4.3 and indicates some major differences from the proofs in [3]. Section A.3 contains the construction of Cantor-like set which is essential for decoupling dependent matrices. Section A.4 develops a major decoupling lemma for \(\tau \)-mixing random matrices and will be used in Sect. A.6 to prove Lemma A.4. Then Sect. A.5 finishes the proof of Theorem 4.3.

1.1 A.1 Introduction to \(\tau \)-Mixing Random Sequence

This section introduces the \(\tau \)-mixing coefficient. Consider \((\Omega ,\mathcal {F},{{\mathbb {P}}})\) to be a probability space, X an \(L_1\)-integrable random variable taking value in a Polish space \(({\mathcal {X}}, \Vert \cdot \Vert _{\mathcal {X}})\), and \(\mathcal {A}\) a sigma algebra of \(\mathcal {F}\). The \(\tau \)-measure of dependence between X and \(\mathcal {A}\) is defined to be

$$\begin{aligned} \tau ({\mathcal {A}}, X;\Vert \cdot \Vert _{\mathcal {X}}) = \Big ||\sup _{g \in \Lambda (\Vert \cdot \Vert _{\mathcal {X}})}\Big \{\int g(x){{\mathbb {P}}}_{X|{\mathcal {A}}}(\mathsf{\mathrm{{d}}}x) - \int g(x){{\mathbb {P}}}_X(\mathsf{\mathrm{{d}}}x) \Big \}\Big ||_{L(1)}, \end{aligned}$$

where \({{\mathbb {P}}}_X\) is the distribution of X, \({{\mathbb {P}}}_{X|{\mathcal {A}}}\) is the conditional distribution of X given \({\mathcal {A}}\), and \(\Lambda (\Vert \cdot \Vert _{\mathcal {X}})\) stands for the set of 1-Lipschitz functions from \({\mathcal {X}}\) to \({{{\mathbb {R}}}}\) with respect to the norm \(\Vert \cdot \Vert _{\mathcal {X}}\).

The following two lemmas from [13] and [12] characterize the intrinsic “coupling property” of \(\tau \)-measure of dependence, which will be heavily exploited in the derivation of our results.

Lemma A.1

(Lemma 3 in [13]) Let \((\Omega , {\mathcal {F}}, {{\mathbb {P}}})\) be a probability space, X be an integrable random variable with values in a Banach space \(({\mathcal {X}}, \Vert \cdot \Vert _{\mathcal {X}})\) and \({\mathcal {A}}\) a sigma algebra of \({\mathcal {F}}\). If Y is a random variable distributed as X and independent of \({\mathcal {A}}\), then

$$\begin{aligned} \tau ({\mathcal {A}}, X;\Vert \cdot \Vert _{\mathcal {X}}) \le {{\mathbb {E}}}\Vert X-Y\Vert _{\mathcal {X}}. \end{aligned}$$

Lemma A.2

(Lemma 5.3 in [12]) Let \((\Omega , \mathcal {{\mathcal {F}}}, {{\mathbb {P}}})\) be a probability space, \({\mathcal {A}}\) be a sigma algebra of \({\mathcal {F}}\), and X be a random variable with values in a Polish space \(({\mathcal {X}}, \Vert \cdot \Vert _{\mathcal {X}})\). Assume that \(\int \Vert x-x_0\Vert _{\mathcal {X}}{{\mathbb {P}}}_X(\mathsf{d}x)\) is finite for any \(x_0 \in {\mathcal {X}}\). Assume that there exists a random variable U uniformly distributed over [0, 1], independent of the sigma algebra generated by X and \({\mathcal {A}}\). Then, there exists a random variable \(\widetilde{X}\), measurable with respect to \({\mathcal {A}}\vee \sigma (X) \vee \sigma (U)\), independent of \({\mathcal {A}}\) and distributed as X, such that

$$\begin{aligned} \tau ({\mathcal {A}}, X;\Vert \cdot \Vert _{\mathcal {X}}) = {{\mathbb {E}}}\Vert X-\widetilde{X}\Vert _{\mathcal {X}}. \end{aligned}$$

Let \(\{X_j\}_{j \in J}\) be a set of \({\mathcal {X}}\)-valued random variables with index set J of finite cardinality. Then, define

$$\begin{aligned} \tau ({\mathcal {A}}, \{X_j \in {\mathcal {X}}\}_{j \in J}; \Vert \cdot \Vert _{\mathcal {X}}) = \Big ||\sup _{g \in \Lambda (\Vert \cdot \Vert _{\mathcal {X}}')}\Big \{\int g(x){{\mathbb {P}}}_{\{X_j\}_{j \in J}|{\mathcal {A}}}(\mathsf{\mathrm{{d}}}x) - \int g(x){{\mathbb {P}}}_{\{X_j \}_{j \in J}}(\mathsf{\mathrm{{d}}}x) \Big \}\Big ||_{L(1)}, \end{aligned}$$

where \({{\mathbb {P}}}_{\{X_j \}_{j \in J}}\) is the distribution of \(\{X_j \}_{j \in J}\), \({{\mathbb {P}}}_{\{X_j\}_{j \in J}|{\mathcal {A}}}\) is the conditional distribution of \(\{X_j \}_{j \in J}\) given \({\mathcal {A}}\), and \(\Lambda (\Vert \cdot \Vert _{\mathcal {X}}')\) stands for the set of 1-Lipschitz functions from \(\underbrace{{\mathcal {X}}\times \cdots \times {\mathcal {X}}}_{\mathop {\text {card}}(J)}\) to \({{{\mathbb {R}}}}\) with respect to the norm \(\Vert x\Vert _{\mathcal {X}}' := \sum _{j \in J}\Vert x_j\Vert _{\mathcal {X}}\) induced by \(\Vert \cdot \Vert _{\mathcal {X}}\) for any \(x=(x_1,\ldots ,x_J)\in \mathcal {X}^{{\mathop {\text {card}}}(J)}\).

Using these concepts, for a sequence of temporally dependent data \(\{X_t\}_{t\in {{\mathbb {Z}}}}\), we are ready to define measure of temporal correlation strength as follows,

$$\begin{aligned}&\tau (k; \{X_t\}_{t \in {{\mathbb {Z}}}}, \Vert \cdot \Vert _{\mathcal {X}}) \\&\quad := \sup _{i > 0}\max _{1 \le \ell \le i} \frac{1}{\ell }\sup \{\tau \{\sigma (X_{-\infty }^a), \{X_{j_1}, \dots , X_{j_\ell }\}; \Vert \cdot \Vert _{\mathcal {X}}\}, a+k \le j_1< \dots < j_\ell \}, \end{aligned}$$

where the inner supremum is taken over all \(a \in {{\mathbb {Z}}}\) and all \(\ell \)-tuples \((j_1, \dots , j_\ell )\). \(\{X_t\}_{t\in {{\mathbb {Z}}}}\) is said to be \(\tau \)-mixing if \(\tau (k; \{X_t\}_{t \in {{\mathbb {Z}}}}, \Vert \cdot \Vert _{\mathcal {X}})\) converges to zero as \(k\rightarrow \infty \). In [12], the authors gave numerous examples of random sequences that are \(\tau \)-mixing.

1.2 A.2 Overview of Proof of Theorem 4.3

The proof of Theorem 4.3 follows largely the proof of Theorem 1 in [3]. Section A.3 reviews the Cantor-like set construction developed and used in [22] and [3]. Lemma A.3 is a slight extension of Lemma 8 in [3]. The major difference is that the 0–1 function used to quantify the distance between two random matrices under \(\beta \)-mixing by Berbee’s decoupling lemma [4] is replaced by an absolute distance function, which is used under \(\tau \)-mixing by Lemma A.1 [13]. Proofs of Lemma A.4 and the rest of Theorem 4.3 follow largely the proofs of Proposition 7 and Theorem 1 in [3] respectively, though with more algebras involved.

1.3 A.3 Construction of Cantor-Like Set

We follow [3] to construct the Cantor-like set \(K_B\) for \(\lbrace 1, \dots , B\rbrace \). Let \(\delta = \frac{\log 2}{2\log B}\) and \(\ell _B = \sup \lbrace k\in {{\mathbb {Z}}}^+: \frac{B\delta (1-\delta )^{k-1}}{2^k} \ge 2\rbrace \). We abbreviate \(\ell := \ell _B\). Let \(n_0 = B\) and for \(j\in \lbrace 1,\dots , \ell \rbrace \),

$$\begin{aligned} n_j = \Big \lceil \frac{B(1-\delta )^j}{2^j}\Big \rceil ~~\mathrm{and}~~d_{j-1} = n_{j-1} - 2n_j. \end{aligned}$$

We start from the set \(\lbrace 1,\dots , B\rbrace \) and divide the set into three disjoint subsets \(I_1^1, J_0^1, I_1^2\) so that \(\mathop {\text {card}}(I_1^1) = \mathop {\text {card}}(I_1^2) = n_1\) and \(\mathop {\text {card}}(J_0^1) = d_0\). Specifically,

$$\begin{aligned} I_1^1 = \{1, \dots , n_1\},\ J_0^1 = \{n_1+1, \dots , n_1+d_0\},\ I_1^2 = \{n_1 + d_0+1, \dots , 2n_1 + d_0\}, \end{aligned}$$

where \(B = 2n_1 + d_0\). Then, we divide \(I_1^1, I_1^2\) with \(J_0^1\) unchanged. \(I_1^1\) is divided into three disjoint subsets \(I_2^1, J_1^1, I_2^2\) in the same way as the previous step with \(\mathop {\text {card}}(I_2^1) = \mathop {\text {card}}(I_2^2) = n_2\) and \(\mathop {\text {card}}(J_1^1) = d_1\). We obtain

$$\begin{aligned} I_2^1 = \{1,\dots , n_2\},\ J_1^1 = \{n_2+1, \dots , n_2 + d_1\},\ I_2^2 = \{n_2 + d_1 + 1, \dots , 2n_2+d_1\}, \end{aligned}$$

where \(n_1 = 2n_2+d_1\). Similarly, \(I_1^2\) is divided into \(I_2^3, J_1^2, I_2^4\) with \(\mathop {\text {card}}(I_2^3) = \mathop {\text {card}}(I_2^4) = n_2\) and \(\mathop {\text {card}}(J_1^2) = d_1\). We obtain

$$\begin{aligned} I_2^3&= \{2n_2 + d_0 + d_1+1,\dots , 3n_2 + d_0 + d_1\},\ J_1^2 = \{3n_2 + d_0 + d_1+1, \dots , 3n_2 + d_0 + 2d_1\},\\ I_2^4&= \{3n_2 + d_0 + 2d_1 + 1, \dots , 4n_2 + d_0 + 2d_1\}, \end{aligned}$$

where \(B = 4n_2 + d_0 + 2d_1\).

Suppose we iterate this process for k times (\(k \in \lbrace 1,\dots , \ell \rbrace \)) with intervals \(I_k^i, i \in \lbrace 1,\dots , 2^k\rbrace \). For each \(I_k^i\), we divide it into three disjoint subsets \(I_{k+1}^{2i-1}, J_{k}^i, I_{k+1}^{2i}\) so that \(\mathop {\text {card}}(I_{k+1}^{2i-1}) = \mathop {\text {card}}(I_{k+1}^{2i}) = n_{k+1}\) and \(\mathop {\text {card}}(J_{k}^i) = d_k\). More specifically, if \(I_k^i = \{a_k^i, \dots , b_k^i\}\), then

$$\begin{aligned}&I_{k+1}^{2i-1} = \{a_k^i, \dots , a_k^i + n_{k+1}-1\},\ J_k^i = \{a_k^i + n_{k+1}, \dots , a_k^i + n_{k+1} + d_k -1\},\\&I_{k+1}^{2i} = \{a_k^i + n_{k+1} + d_k, \dots , a_k^i + 2n_{k+1} + d_k-1\}. \end{aligned}$$

After \(\ell \) steps, we obtain \(2^\ell \) disjoint subsets \(I_\ell ^{i}, i\in \lbrace 1,\dots , 2^\ell \rbrace \) with \(\mathop {\text {card}}(I_\ell ^{i}) = n_\ell \). Then, the Cantor-like set is defined as

$$\begin{aligned} K_B = \bigcup \limits _{i = 1}^{2^\ell }I_{\ell }^i, \end{aligned}$$

and for each level \(k \in \lbrace 0,\dots , \ell \rbrace \) and each \(j \in \lbrace 1, \dots , 2^k\rbrace \), define

$$\begin{aligned} K_k^j = \bigcup \limits _{i = (j-1)2^{\ell -k}+1}^{j2^{\ell -k}}I_{\ell }^i. \end{aligned}$$

Some properties derived from this construction are given by Banna et al. [3]:

  1. 1.

    \(\delta \le \frac{1}{2}\) and \(\ell \le \frac{\log B}{\log 2}\);

  2. 2.

    \(d_j \ge \frac{B\delta (1-\delta )^j}{2^{j+1}}\) and \(n_\ell \le \frac{B(1-\delta )^\ell }{2^{\ell -1}}\);

  3. 3.

    Each \(I_\ell ^i, i\in \lbrace 1,\dots , 2^\ell \rbrace \) contains \(n_\ell \) consecutive integers, and for any \(i\in \lbrace 1,\dots , 2^{\ell -1}\rbrace \), \(I_\ell ^{2i-1}\) and \(I_\ell ^{2i}\) are spaced by \(d_{\ell -1}\) integers;

  4. 4.

    \(\mathop {\text {card}}(K_B) \ge \frac{B}{2}\);

  5. 5.

    For each \(k \in \lbrace 0,\dots , \ell \rbrace \) and each \(j \in \lbrace 1, \dots , 2^k\rbrace \), \(\mathop {\text {card}}(K_k^j) = 2^{\ell -k}n_\ell \). For each \(j \in \lbrace 1, \dots , 2^{k-1}\rbrace \), \(K_k^{2j-1}\) and \(K_k^{2j}\) are spaced by \(d_{k-1}\) integers;

  6. 6.

    \(K_0^1 = K_B\) and \(K_\ell ^j = I_\ell ^j\) for \(j \in \lbrace 1, \dots , 2^\ell \rbrace \).

1.4 A.4 A Decoupling Lemma for \(\tau \)-Mixing Random Matrices

This section introduces the key tool to decouple \(\tau \)-mixing random matrices using Cantor-like set constructed in Sect. A.3. With some abuse of notation, within this section let’s use \(\lbrace \mathbf {X}_j \rbrace _{j\in \{1,\ldots ,n\}}\) to denote a generic sequence of \(p\times p\) symmetric random matrices. Assume \({{\mathbb {E}}}(\mathbf {X}_j) = {\mathbf {0}}\) and \(||\mathbf {X}_j ||\le M\) for some positive constant M and for all \(j\ge 1\). For a collection of index sets \(H^k_1,\ k\in \lbrace 1,\dots , d\rbrace \), we assume that their cardinalities are equal and even. Denote \(\lbrace \mathbf {X}_j \rbrace _{j \in H^k_1}\) to be the set of matrices whose indices are in \(H^k_1\). Assume \(\lbrace \mathbf {X}_j \rbrace _{j \in H_1^1}, \dots , \lbrace \mathbf {X}_j \rbrace _{j \in H^d_1}\) are mutually independent, while within each block \(H_1^k\) the matrices are possibly dependent. For each k, decompose \(H^k_1\) into two disjoint sets \(H^{2k-1}_2\) and \(H^{2k}_2\) with equal size, containing the first and second half of \(H^k_1\), respectively. In addition, we denote \(\tau _0:= \tau \{\sigma (\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k-1}_2}),\ \lbrace \mathbf {X}_j \rbrace _{j \in H^{2k}_2};\Vert \cdot \Vert \}\) for some constant \(\tau _0 \ge 0\) and for all \(k \in \lbrace 1,\dots , d\rbrace \). For a given \(\epsilon > 0\), we achieve the following decoupling lemma.

Lemma A.3

We obtain for any \(\epsilon >0\),

$$\begin{aligned}&{{\mathbb {E}}}\,\, \mathrm{{Tr}} \exp \left( t\sum _{k = 1}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \\&\quad \le \sum \limits _{i = 0}^{d}\left( {\begin{array}{c}d\\ i\end{array}}\right) (1 + L_1 + L_2)^{d - i}(L_1)^{i}{{\mathbb {E}}}\,\, \mathrm{{Tr}} \exp \left\{ (-1)^i t\left( \sum \limits _{k = 1}^{2d}\sum _{j \in H^{k}_2} {\widetilde{\mathbf {X}}}_j \right) \right\} ,\\&{{\mathbb {E}}}\,\, \mathrm{{Tr}} \exp \left( -t\sum _{k = 1}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \\&\quad \le \sum \limits _{i = 0}^{d}\left( {\begin{array}{c}d\\ i\end{array}}\right) (1 + L_1 + L_2)^{d - i}(L_1)^{i}{{\mathbb {E}}}\,\, \mathrm{{Tr}} \exp \left\{ (-1)^{i+1} t\left( \sum \limits _{k = 1}^{2d}\sum _{j \in H^k_2} {\widetilde{\mathbf {X}}}_j \right) \right\} , \end{aligned}$$

where

$$\begin{aligned} L_1 := pt\epsilon \exp (t\epsilon ), ~~L_2 := \exp \{\mathop {\text {card}}(H_1^1)tM\}\tau _0/\epsilon , \end{aligned}$$

and \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^k_2},\ k \in \lbrace 1, \dots , 2d\rbrace \), are mutually independent and have the same distributions as \(\lbrace \mathbf {X}_j \rbrace _{j \in H^k_2}\), \(k \in \lbrace 1, \dots , 2d\rbrace \).

Proof

We prove this lemma by induction. For any \(k\in \lbrace 1,\dots , d\rbrace \), we have \(H^k_1 = H^{2k-1}_2 \cup H^{2k}_2\) and hence \(\sum _{j \in H^k_1}\mathbf {X}_j = \sum _{j \in H^{2k-1}_2}\mathbf {X}_j + \sum _{j \in H^{2k}_2}\mathbf {X}_j\).

By Lemma A.2, for each \(k \in \lbrace 1,\dots , d\rbrace \), we could find a sequence of random matrices \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^{2k}_2}\) and an independent uniformly distributed random variable \(U_k\) on [0, 1] such that

  1. 1.

    \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^{2k}_2}\) is measurable with respect to the sigma field \(\sigma (\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k-1}_2})\vee \sigma (\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k}_2}) \vee \sigma (U_k)\);

  2. 2.

    \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^{2k}_2}\) is independent of \(\sigma (\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k-1}_2})\);

  3. 3.

    \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^{2k}_2}\) has the same distribution as \(\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k}_2}\);

  4. 4.

    \({{\mathbb {P}}}(||\sum _{j \in H^{2k}_2} \mathbf {X}_j - \sum _{j \in H^{2k}_2} {\widetilde{\mathbf {X}}}_j ||> \epsilon _k) \le {{\mathbb {E}}}(||\sum _{j \in H^{2k}_2} \mathbf {X}_j - \sum _{j \in H^{2k}_2} {\widetilde{\mathbf {X}}}_j ||)/\epsilon _k \le \tau _0/\epsilon _k\) by Markov’s inequality and the fact that \(\tau _0 = \sum _{j \in H^{2k}_2}{{\mathbb {E}}}(||\mathbf {X}_j - {\widetilde{\mathbf {X}}}_j ||)\).

To make notation easier to follow, we set equal value to \(\epsilon _k\) for \(k \in \lbrace 1, \dots , d\rbrace \) and denote it as \(\epsilon \). Moreover, we denote the event \(\Gamma _{k} = \lbrace ||\sum _{j \in H^{2k}_2} {\widetilde{\mathbf {X}}}_j - \sum _{j \in H^{2k}_2} \mathbf {X}_j ||\le \epsilon \rbrace \) for \(k \in \lbrace 1, \dots , d\rbrace \).

For the base case, \(k = 1\).

Notice the definitions of terms I and II therein.

We have

By linearity of expectation and the facts that \(\mathop {\text {Tr}}(\mathbf {X}) \le p||\mathbf {X}||\) and \(||\exp (\mathbf {X}) - \exp (\mathbf {Y})||\le ||\mathbf {X}- \mathbf {Y}||\exp (||\mathbf {X}- \mathbf {Y}||)\exp (||\mathbf {Y}||)\), we obtain

By spectral mapping theorem, for a symmetric matrix \(\mathbf {X}\) with \(\Vert \mathbf {X}\Vert \le M\), we have \(\exp (||\mathbf {X}||) \le ||\exp (\mathbf {X})||\vee ||\exp (-\mathbf {X})||\le ||\exp (\mathbf {X})||+ ||\exp (-\mathbf {X})||\). Moreover, since \(\exp (\mathbf {X})\) is always positive definite for any matrix \(\mathbf {X}\) and \(||\mathbf {X}||\le \mathop {\text {Tr}}(\mathbf {X})\) for any positive definite symmetric matrix \(\mathbf {X}\), we obtain \(||\exp (\mathbf {X})||\le \mathop {\text {Tr}}\exp (\mathbf {X})\) and \(||\exp (-\mathbf {X})||\le \mathop {\text {Tr}}\exp (-\mathbf {X})\). In addition, since we have \(\Vert \sum _{j \in H_2^2}(\mathbf {X}_j - {\widetilde{\mathbf {X}}}_j)\Vert \le \epsilon \) on \(\Gamma _1\), we could further bound the inequality above by

Putting together, we reach

$$\begin{aligned} I \le&\{1+pt\epsilon \exp (t\epsilon )\}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ t\left( \sum _{j \in H^1_2}\mathbf {X}_j +\sum _{j \in H^2_2}{\widetilde{\mathbf {X}}}_j + \sum _{k = 2}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \right\} \nonumber \\&+ pt\epsilon \exp (t\epsilon ){{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ -t\left( \sum _{j \in H^1_2}\mathbf {X}_j +\sum _{j \in H^2_2}{\widetilde{\mathbf {X}}}_j + \sum _{k = 2}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \right\} . \end{aligned}$$
(A.1)

We then aim at II. For this, the proof largely follows the same argument as in [3]. Omitting the details, we obtain

$$\begin{aligned} II\le&\exp \{\mathop {\text {card}}(H^1_1)tM\}(\tau _0/\epsilon ){{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ t\left( \sum _{j \in H_1^2} \mathbf {X}_j + \sum _{j \in H_2^2} {\widetilde{\mathbf {X}}}_j + \sum _{k = 2}^{d}\sum _{j \in H_k^1}\mathbf {X}_j\right) \right\} . \end{aligned}$$
(A.2)

Denote \(L_1 := pt\epsilon \exp (t\epsilon )\) and \(L_2 := \exp \{\mathop {\text {card}}(H_1^1)tM\}\tau _0/\epsilon \). Combining (A.1) and (A.2) yields

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{k = 1}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \\&\quad \le (1 + L_1 + L_2){{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ t\left( \sum _{j \in H^1_2} \mathbf {X}_j + \sum _{j \in H^2_2} {\widetilde{\mathbf {X}}}_j + \sum _{k = 2}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \right\} \\&\qquad + L_1 {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ -t\left( \sum _{j \in H^1_2} \mathbf {X}_j + \sum _{j \in H^2_2} {\widetilde{\mathbf {X}}}_j + \sum _{k = 2}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \right\} \\&\quad = \sum \limits _{i = 0}^{1}\left( {\begin{array}{c}1\\ i\end{array}}\right) (1 + L_1 + L_2)^{1 - i}(L_1)^{i}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^i t\left( \sum _{j \in H^1_2} \mathbf {X}_j + \sum _{j \in H^2_2} {\widetilde{\mathbf {X}}}_j + \sum _{k = 2}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \right\} . \end{aligned}$$

This finishes the base case.

The induction steps are followed similarly and we omit the details. By iterating d times, we arrive at the following inequality:

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{k = 1}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \nonumber \\&\quad \le \sum \limits _{i = 0}^{d}\left( {\begin{array}{c}d\\ i\end{array}}\right) (1 + L_1 + L_2)^{d - i}(L_1)^{i}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^i t\left( \sum \limits _{k = 1}^{d}\sum _{j \in H^{2k-1}_2} \mathbf {X}_j + \sum \limits _{k = 1}^{d}\sum _{j \in H^{2k}_2} {\widetilde{\mathbf {X}}}_j\right) \right\} , \end{aligned}$$
(A.3)

where \(\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k-1}_2},\ k \in \lbrace 1, \dots , d\rbrace \) and \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^{2k}_2},\ k \in \lbrace 1, \dots , d\rbrace \) are mutually independent. In addition, they have the same distributions as \(\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k-1}_2},\ k \in \lbrace 1, \dots , d\rbrace \) and \(\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k}_2},\ k \in \lbrace 1, \dots , d\rbrace \), respectively. For the sake of simplicity and clarity, we add an upper tilde to the matrices with indices in \(H^{2k-1}_2,\ k \in \lbrace 1, \dots , d\rbrace \), i.e., \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^{2k-1}_2}\) is identically distributed as \(\lbrace \mathbf {X}_j \rbrace _{j \in H^{2k-1}_2}\) for \(k \in \lbrace 1, \dots , d\rbrace \). Hence, (A.3) could be rewritten as

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{k = 1}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \le \sum \limits _{i = 0}^{d}\left( {\begin{array}{c}d\\ i\end{array}}\right) (1 + L_1 + L_2)^{d - i}(L_1)^{i}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^i t\left( \sum \limits _{k = 1}^{2d}\sum _{j \in H^{k}_2} {\widetilde{\mathbf {X}}}_j \right) \right\} , \end{aligned}$$

where \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in H^k_2},\ k \in \lbrace 1, \dots , 2d\rbrace \) are mutually independent and their distributions are the same as \(\lbrace \mathbf {X}_j \rbrace _{j \in H^k_2},\ k \in \lbrace 1, \dots , 2d\rbrace \).

By changing \(\mathbf {X}\) to \(-\mathbf {X}\), we immediately get the following bound:

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( -t\sum _{k = 1}^{d}\sum _{j \in H^k_1}\mathbf {X}_j\right) \\&\quad \le \sum \limits _{i = 0}^{d}\left( {\begin{array}{c}d\\ i\end{array}}\right) (1 + L_1 + L_2)^{d - i}(L_1)^{i}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^{i+1} t\left( \sum \limits _{k = 1}^{2d}\sum _{j \in H^{k}_2} {\widetilde{\mathbf {X}}}_j \right) \right\} . \end{aligned}$$

This completes the proof of Lemma A.3. \(\square \)

1.5 A.5 Proof of Theorem 4.3

Proof

Without loss of generality, let \(\psi _1 = {\widetilde{\psi }}_1\).

Case I First of all, we consider \(M = 1\).

Step I (Summation decomposition) Let \(B_0 = n\) and \(\mathbf {U}^{(0)}_{j} = \mathbf {X}_{j}\) for \(j \in \lbrace 1,\dots , n\rbrace \). Let \(K_{B_0}\) be the Cantor-like set from \(\lbrace 1,\dots , B_0\rbrace \) by construction of Sect. A.3, \(K_{B_0}^c = \lbrace 1,\dots , B_0\rbrace \setminus K_{B_0}\), and \(B_1 = \mathop {\text {card}}(K_{B_0}^c)\). Then, define

$$\begin{aligned}&\mathbf {U}^{(1)}_{j} = \mathbf {X}_{i_j}, \text { where } i_j \in K_{B_0}^c = \lbrace i_1, \dots , i_{B_1}\rbrace . \end{aligned}$$

For each \(i\ge 1\), let \(K_{B_i}\) be constructed from \(\lbrace 1,\dots , B_{i} \rbrace \) by the same Cantor-like set construction. Denote \(K_{B_{i}}^c = \lbrace 1,\dots , B_{i} \rbrace \setminus K_{B_{i}}\) and \(B_{i+1} = \mathop {\text {card}}(K_{B_{i}}^c)\). Then

$$\begin{aligned} \mathbf {U}^{(i+1)}_{j} = \mathbf {U}^{(i)}_{k_j}, \text { where } k_j \in K_{B_{i}}^c = \lbrace k_1, \dots , k_{B_{i+1}}\rbrace . \end{aligned}$$

We stop the process when there is a smallest L such that \(B_L \le 2\). Then, we have for \(i\le L-1\), \(B_i \le n2^{-i}\) because each Cantor-like set \(K_{B_{i+1}}\) has cardinality greater than \(B_{i}/2\). Also notice that \(L \le [\log n/\log 2]\).

For \(i \in \lbrace 0,\dots , L-1\rbrace \), denote

$$\begin{aligned} {{\mathbf {S}}}_i = \sum \limits _{j \in K_{B_i}} \mathbf {U}_j^{(i)} \text { and } {{\mathbf {S}}}_L = \sum _{j \in K_{B_{L-1}^c}} \mathbf {U}^{(L)}_j. \end{aligned}$$

Then, we observe

$$\begin{aligned} \sum _{j = 1}^n \mathbf {X}_j= \sum _{i = 0}^{L}{{\mathbf {S}}}_i. \end{aligned}$$

Step II (Bounding Laplacian transform) This step hinges on the following lemma, which provides an upper bound for the Laplace transform of sum of a sequence of random matrices which are \(\tau \)-mixing with geometric decay, i.e., \(\tau (k) \le \psi _1\exp \{-\psi _2(k-1)\}\) for all \(k \ge 1\) for some constants \(\psi _1, \psi _2 > 0\). \(\square \)

Lemma A.4

(Proof in Sect. A.6) For a sequence of \(p \times p\) matrices \(\lbrace \mathbf {X}_i \rbrace \), \(i \in \lbrace 1,\dots , B\rbrace \) satisfying conditions in Theorem 4.3 with \(M = 1\) and \(\psi _1\ge p^{-1}\), there exists a subset \(K_B \subseteq \lbrace 1,\dots , B\rbrace \) such that for \(0 < t \le \min \{1, \frac{\psi _2}{8\log (\psi _1B^6p)}\}\),

$$\begin{aligned}&\log {{\mathbb {E}}}\,\,\mathrm{{Tr}} \exp \bigg (t\sum _{j \in K_B}\mathbf {X}_j\bigg ) \le \log p + 4h(4)Bt^2\nu ^2 \\&\quad + 151\Big [1 + \exp \Big \{ \frac{1}{\sqrt{p}} \exp \Big (-\frac{\psi _2}{64t}\Big )\Big \}\Big ]\frac{t^2}{\psi _2} \exp \Big (-\frac{\psi _2}{64t}\Big ). \end{aligned}$$

For each \({{\mathbf {S}}}_i, i \in \lbrace 0,\dots , L-1\rbrace \), by applying Lemma A.4 with \(B = B_i\), we have for any positive t satisfying \(0 < t \le \min \{1,\frac{\psi _2}{8\log \{\psi _1 (n2^{-i})^6p\}}\}\),

$$\begin{aligned}&\log {{\mathbb {E}}}\mathop {\text {Tr}}\exp (t{{\mathbf {S}}}_i) \le \log p + t^2(C_12^{-i}n + C_{2,i}), \end{aligned}$$

where \(C_1 := 4h(4)\nu ^2,C_{2,i} := 302\cdot 2^{\frac{6i}{8}}/\psi _2n^{\frac{6}{8}}\).

Denote

$$\begin{aligned} \widetilde{f}(\psi _1, \psi _2, i)&:= \min \Big \{1, \frac{\psi _2}{8\log \{\psi _1 (n2^{-i})^6p\}}\Big \}. \end{aligned}$$

For any \(0 < t \le \widetilde{f}(\psi _1, \psi _2, i)\), we obtain

$$\begin{aligned} \log {{\mathbb {E}}}\mathop {\text {Tr}}\exp (t{{\mathbf {S}}}_i) \le \log p + \frac{t^2(C_12^{-i}n + C_{2,i})}{1 - t/\widetilde{f}(\psi _1, \psi _2, i)} \le \log p + \frac{t^2\{C_1^{\frac{1}{2}}(2^{-i}n)^{\frac{1}{2}} + C_{2,i}^{\frac{1}{2}}\}^2}{1 - t/\widetilde{f}(\psi _1, \psi _2, i)}. \end{aligned}$$

For \({{\mathbf {S}}}_L\), since \(B_L \le 2\), for \(0 < t \le 1\),

$$\begin{aligned} \log {{\mathbb {E}}}\mathop {\text {Tr}}\exp (t{{\mathbf {S}}}_L)&\le \log p + t^2h(2t)\lambda _{\max }\{{{\mathbb {E}}}({{\mathbf {S}}}_L^2)\} \le \log p + \frac{2t^2\nu ^2}{1-t}. \end{aligned}$$

Denote \(\sigma _i := C_1^{\frac{1}{2}}(2^{-i}n)^{\frac{1}{2}} + C_{2,i}^{\frac{1}{2}},\ \sigma _L := \sqrt{2}\nu ,\ \kappa _i := 1/\widetilde{f}(\psi _1, \psi _2, i),\ \text {and }\ \kappa _L := 1\).

Summing up, we have

$$\begin{aligned} \sum _{i = 0}^{L} \sigma _i&= \sum _{i = 0}^{L-1} \{C_1^{\frac{1}{2}}(2^{-i}n)^{\frac{1}{2}} + C_{2,i}^{\frac{1}{2}} \}+ \sqrt{2}\nu \le 15\sqrt{n}\nu + 60\sqrt{1/\psi _2},\\ \sum _{i = 0}^L \kappa _i&\le \frac{\log n}{\log 2}\max \left\{ 1,\frac{8\log (\psi _1n^6p)}{\psi _2}\right\} := {\widetilde{\psi }}(\psi _1,\psi _2,n,p). \end{aligned}$$

Hence, by Lemma 3 in [22], for \(0 < t \le \{{\widetilde{\psi }}(\psi _1,\psi _2,n,p)\}^{-1}\), we have

$$\begin{aligned} \log {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{j = 1}^n\mathbf {X}_j\right) \le \log p + \frac{t^2\left( 15\sqrt{n}\nu +60\sqrt{1/\psi _2}\right) ^2}{1 - t{\widetilde{\psi }}(\psi _1,\psi _2,n,p)}. \end{aligned}$$

Step III (Matrix Chernoff bound) Lastly by matrix Chernoff bound, we obtain

$$\begin{aligned} {{\mathbb {P}}}\left\{ \lambda _{\max }\left( \sum _{j = 1}^{n}\mathbf {X}_j\right) \ge x\right\} \le p \exp \left\{ -\frac{x^2}{8(15^2n\nu ^2 + 60^2/\psi _2) + 2x{\widetilde{\psi }}(\psi _1,\psi _2,n,p)}\right\} . \end{aligned}$$

Case II We consider general \(M>0\). It is obvious that if \(\{\mathbf {X}_t\}_{t \in {{\mathbb {Z}}}}\) is a sequence of \(\tau \)-mixing random matrices such that \(\tau (k; \{\mathbf {X}_t\}_{t \in {{\mathbb {Z}}}}, \Vert \cdot \Vert ) \le M\psi _1\exp \{-\psi _2(k-1)\}\), then \(\{\mathbf {X}_i/M\}_{i \in {{\mathbb {Z}}}}\) is also a sequence of \(\tau \)-mixing random matrices such that \(\tau (k; \{\mathbf {X}_t/M\}_{t \in {{\mathbb {Z}}}}, \Vert \cdot \Vert ) \le \psi _1\exp \{-\psi _2(k-1)\}\) and \(\Vert \mathbf {X}_t/M\Vert \le 1\). Then applying the result of Case I to \(\{\mathbf {X}_i/M\}_{i \in {{\mathbb {Z}}}}\), we obtain

$$\begin{aligned}&{{\mathbb {P}}}\left\{ \lambda _{\max }\left( \sum _{j = 1}^{n}\mathbf {X}_j/M\right) \ge x\right\} \le p \exp \left\{ -\frac{x^2}{8(15^2n\nu _M^2 + 60^2/\psi _2) + 2x{\widetilde{\psi }}(\psi _1,\psi _2,n,p)}\right\} , \end{aligned}$$

where \(\nu _M^2 := \sup _{K\subseteq \lbrace 1,\dots , n \rbrace }\frac{1}{\mathop {\text {card}}(K)}\lambda _{\max }\bigg \{{{\mathbb {E}}}\bigg (\sum _{i \in K}\mathbf {X}_i/M\bigg )^2\bigg \} = \nu ^2/M^2 \) for \(\nu ^2\) defined in Theorem 4.3. Thus,

$$\begin{aligned}&{{\mathbb {P}}}\left\{ \lambda _{\max }\left( \sum _{j = 1}^{n}\mathbf {X}_j\right) \ge x\right\} \le p \exp \left\{ -\frac{x^2}{8(15^2n\nu ^2 + 60^2M^2/\psi _2) + 2xM{\widetilde{\psi }}(\psi _1,\psi _2,n,p)}\right\} . \end{aligned}$$

This completes the proof of Theorem 4.3. \(\square \)

1.6 A.6 The Proof of Lemma A.4

Proof

Let \(K_B\) be constructed as in Sect. A.3 for any arbitrary \(B \ge 2\) and \(M = 1\).

Case I If \(0 < t \le 4/B\), by Lemma 4 in [3], we have

$$\begin{aligned} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{i\in K_B} \mathbf {X}_i\right) \le p\exp \left[ t^2h\left\{ t\lambda _{\max }\left( \sum _{i\in K_B} \mathbf {X}_i\right) \right\} \lambda _{\max }\left\{ {{\mathbb {E}}}\left( \sum _{i\in K_B}\mathbf {X}_i\right) ^2\right\} \right] . \end{aligned}$$

By Weyl’s inequality, \(\lambda _{\max }(\sum _{i\in K_B} \mathbf {X}_i) \le B\) since \(\mathop {\text {card}}(K_B) \le B\), and by definition of \(\nu ^2\) in Theorem 4.3, we have \(\lambda _{\max }\{{{\mathbb {E}}}(\sum _{i\in K_B}\mathbf {X}_i)^2\} \le B\nu ^2\). Therefore, we obtain \(h\{t\lambda _{\max }(\sum _{i\in K_B} \mathbf {X}_i)\} \le h(tB) \le h(4)\) and

$$\begin{aligned} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \Big (t\sum _{i\in K_B} \mathbf {X}_i\Big ) \le p\exp \{t^2h(4)B\nu ^2\}. \end{aligned}$$
(A.4)

Case II Now we consider the case where \(4/B < t \le \min \{1, \frac{\psi _2}{8\log (\psi _1B^6p)}\}\).

Step I Let J be a chosen integer from \(\lbrace 0, \dots , \ell _B\rbrace \) whose actual value will be determined later. We will use the same notation to denote Cantor-like sets as in Sect. A.3. By Lemma A.3 and similar induction argument as in [3], we obtain

$$\begin{aligned} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{j \in K_0^1}\mathbf {X}_j\right) \le&\sum \limits _{i_1 = 0}^{2^0}\dots \sum \limits _{i_J = 0}^{2^{J-1}}\left[ \left( \prod \limits _{k = 1}^{J} A_{k,i_k}\right) {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^{\sum _{k = 1}^{J} i_k}t\left( \sum \limits _{i' = 1}^{2^J}\sum _{j \in K_{J}^{i'}}{\widetilde{\mathbf {X}}}_j\right) \right\} \right] , \end{aligned}$$
(A.5)

where \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j \in K_{J}^{i'}}\) for \(i' \in \lbrace 1, \cdots , 2^J \rbrace \) are mutually independent and have the same distributions as \(\lbrace \mathbf {X}_j\rbrace _{j \in K_{J}^{i'}}\) for \(i' \in \lbrace 1, \cdots , 2^J \rbrace \), and

$$\begin{aligned} A_{k,i_k}&:= \left( {\begin{array}{c}2^{k-1}\\ i_k\end{array}}\right) (1 + L_{k,1} + L_{k,2})^{2^{k-1}-i_k}(L_{k,1})^{i_k},\\ \epsilon _k&:=(2pt)^{-\frac{1}{2}}\{2^{\ell -k}n_\ell \exp (t2^{\ell -k+1} n_\ell )\tau _{d_{k-1}+1}\}^{\frac{1}{2}},\\ L_{k,1}&:= (pt/2)^{\frac{1}{2}}\exp (t\epsilon _k)\{2^{\ell -k}n_\ell \exp (t2^{\ell -k+1} n_\ell )\tau _{d_{k-1}+1}\}^{\frac{1}{2}},\\ L_{k,2}&:= (2pt)^{\frac{1}{2}}\exp (t\epsilon _k)\{2^{\ell -k}n_\ell \exp (t2^{\ell -k+1} n_\ell )\tau _{d_{k-1}+1}\}^{\frac{1}{2}}, \end{aligned}$$

Step II Now we choose J as follows:

$$\begin{aligned} J&= \inf \left\{ k\in \lbrace 0, \dots , \ell \rbrace : \frac{B(1-\delta )^k}{2^k} \le \min \Big \{\frac{\psi _2}{8t^2}, B\Big \}\right\} . \end{aligned}$$

We first bound \({{\mathbb {E}}}\mathop {\text {Tr}}\exp \{t(\sum \limits _{i' = 1}^{2^J}\sum _{j \in K_{J}^{i'}}{\widetilde{\mathbf {X}}}_j)\}\) and \({{\mathbb {E}}}\mathop {\text {Tr}}\exp \{-t(\sum \limits _{i' = 1}^{2^J}\sum _{j \in K_{J}^{i'}}{\widetilde{\mathbf {X}}}_j)\}\). From (A.5), we obtain \(2^J\) sets of \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace \) that are mutually independent. To make notation less cluttered, we will remove the upper tilde from \( {\widetilde{\mathbf {X}}}_j\) for all j. Denote the number of matrices in each set \(K_{J}^i\) to be \(q := 2^{\ell -J}n_\ell \). For each set \(K_{J}^i,\ i \in \lbrace 1,\dots , 2^J\rbrace \), we divide it into consecutive sets with cardinality \(\widetilde{q}\) and potentially a residual term if q is not divisible by \(\widetilde{q}\). More specifically, we have \(2\widetilde{q} \le q\) and \(m_{q,\widetilde{q}} := [q/2\widetilde{q}]\). The value \({\widetilde{q}}\) will be determined later.

Then, each set \(K_{J}^i\) contains \(2m_{q,\widetilde{q}}\) numbers of sets with cardinality \(\widetilde{q}\) and one set with cardinality less than \(2\widetilde{q}\). For each \(K_{J}^i,\ i \in \lbrace 1, \dots , 2^J\rbrace \), denote these consecutive sets described above by \(Q_k^i,\ k \in \lbrace 1,\dots , 2m_{q,\widetilde{q}}+1\rbrace \). Given these notation, we could rewrite the bound in the following:

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{i = 1}^{2^J}\sum _{j \in K_{J}^i}\mathbf {X}_j\right) \\&\quad = {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{i = 1}^{2^J}\sum _{k = 1}^{2m_{q,\widetilde{q}}+1}\sum _{j \in Q_k^i}\mathbf {X}_j\right) ={{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}}\sum _{j \in Q_{2k}^i}\mathbf {X}_j + t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) . \end{aligned}$$

Since \(\mathop {\text {Tr}}\exp (\cdot )\) is convex (cf. Proposition 2 in [25]), by Jensen’s inequality, we have

$$\begin{aligned} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{i = 1}^{2^J}\sum _{j \in K_{J}^i}\mathbf {X}_j\right)&\le \frac{1}{2}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}}\sum _{j \in Q_{2k}^i}\mathbf {X}_j\right) \\&\quad + \frac{1}{2}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) . \end{aligned}$$

Since the number of odd index sets is always equal to or one more than that of the even index sets, the upper bound of \(\frac{1}{2}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \Big (2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}}\sum _{j \in Q_{2k}^i}\mathbf {X}_j\Big )\) will always be less than or equal to that of \(\frac{1}{2}{{\mathbb {E}}}\mathop {\text {Tr}}\exp \Big (2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\Big )\). Hence, we only need to provide an upper bound for \({{\mathbb {E}}}\mathop {\text {Tr}}\exp \Big (2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\Big )\). Our goal is then to replace all \(\lbrace \mathbf {X}_j\rbrace _{j\in Q_{2k-1}^i}\) in the last inequality by mutually independent copies \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_{2k-1}^i}\) with same distributions for \(k \in \lbrace 1,\dots , 2m_{q,\widetilde{q}}+1 \rbrace ,\ i \in \lbrace 1,\dots , 2^J\rbrace \). Again we will proceed by induction. We first show

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \\&\quad \le \sum \limits _{i_1 = 0}^{1}\widetilde{A}_{i_1} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^{i_1}2t\left( \sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^1}{\widetilde{\mathbf {X}}}_j + \sum _{i = 2}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \right\} , \end{aligned}$$

where the constants \({\widetilde{A}}_{i_1}\) will be specified later. For each \(\lbrace \mathbf {X}_j\rbrace _{j \in Q_{2k-1}^1},\ k \in \lbrace 1, \dots , m_{q,\widetilde{q}}+1\rbrace \), we could find a sequence of \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j \in Q_{2k-1}^1},\ k \in \lbrace 1, \dots , m_{q,\widetilde{q}}+1\rbrace \) that are mutually independent with each other. More specifically, let \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1} = \lbrace \mathbf {X}_j \rbrace _{j\in Q_1^1}\). By applying Lemma A.2 on \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1}\) and \(\lbrace \mathbf {X}_j\rbrace _{j\in Q_3^1}\) with a chosen \({\widetilde{\epsilon }} > 0\), we may find a sequence of random matrices \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j \in Q_3^1}\) such that for each \(j_0\in Q_3^1\), we have

  1. 1.

    \({\widetilde{\mathbf {X}}}_{j_0}\) is measurable with respect to \(\sigma (\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1}) \vee \sigma (\mathbf {X}_{j_0}) \vee \sigma (\widetilde{U}_{j_0}^1)\);

  2. 2.

    \({\widetilde{\mathbf {X}}}_{j_0}\) is independent of \(\sigma (\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1})\);

  3. 3.

    \({\widetilde{\mathbf {X}}}_{j_0}\) has the same distribution as \(\mathbf {X}_{j_0}\);

  4. 4.

    \({{\mathbb {P}}}(\parallel {\widetilde{\mathbf {X}}}_{j_0} - \mathbf {X}_{j_0} \parallel \ge {\widetilde{\epsilon }}) \le {{\mathbb {E}}}(\parallel {\widetilde{\mathbf {X}}}_{j_0} - \mathbf {X}_{j_0} \parallel )/{\widetilde{\epsilon }} \le \tau _{\widetilde{q}+1}/{\widetilde{\epsilon }}\) by Markov’s inequality.

For each \(j_0\in Q_3^1\), \(\widetilde{U}_{j_0}^1\) is independent with \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1}\) and \(\mathbf {X}_{j_0}\). In addition, since there are at least \(\widetilde{q}\) number of matrices between \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1}\) and \(\mathbf {X}_{j_0}\) by our construction, we have \(\tau \{\sigma (\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j\in Q_1^1}), \mathbf {X}_{j_0}; \Vert \cdot \Vert \} \le \tau _{\widetilde{q}+1}\). Note that \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j \in Q_3^1}\) is independent with \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j\in Q_1^1}\) but not mutually independent within the set \(Q_3^1\).

Following the induction steps similar to the previous step and without redundancy, we obtain

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \\&\quad \le \sum \limits _{i_1 = 0}^{1}\widetilde{A}_{i_1} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^{i_1}2t\left( \sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^1}{\widetilde{\mathbf {X}}}_j + \sum _{i = 2}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \right\} , \end{aligned}$$

where

$$\begin{aligned} {\widetilde{\epsilon }}&:= (4pt)^{-\frac{1}{2}}\{\exp (2tq)\tau _{\widetilde{q}+1}\}^{\frac{1}{2}},\\ \widetilde{L}_{1}&:= \frac{1}{2}(4pt)^{\frac{1}{2}}q\exp (2tq{\widetilde{\epsilon }})\{\exp (2tq)\tau _{\widetilde{q}+1}\}^{\frac{1}{2}},\\ \widetilde{L}_{2}&:= (4pt)^{\frac{1}{2}}q\{\exp (2tq)\tau _{\widetilde{q}+1}\}^{\frac{1}{2}},\\ \widetilde{A}_{i_1}&:= \left( {\begin{array}{c}1\\ i_1\end{array}}\right) (1 + \widetilde{L}_{1} + \widetilde{L}_{2})^{1 - i_1}(\widetilde{L}_{1})^{i_1}. \end{aligned}$$

This completes the base case.

Iterating the above calculation, we arrive at the following bound:

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \nonumber \\&\quad \le \sum \limits _{i_1 = 0}^{1}\dots \sum \limits _{i_{2^J} = 0}^{1}\left( \prod \limits _{r = 1}^{2^J}\! \widetilde{A}_{i_r}\right) {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left\{ (-1)^{\sum _{r = 1}^{2^J}i_{r}}2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}{\widetilde{\mathbf {X}}}_j\right\} , \end{aligned}$$
(A.6)

where \(\lbrace {\widetilde{\mathbf {X}}}_j\rbrace _{j \in Q_{2k-1}^i}\) for \((i,k) \in \lbrace 1,\dots , 2^J \rbrace \times \lbrace 1, \dots , m_{q,\widetilde{q}}+1 \rbrace \) are mutually independent and identically distributed as \(\lbrace \mathbf {X}_j\rbrace _{j \in Q_{2k-1}^i}\) for \((i,k) \in \lbrace 1,\dots , 2^J \rbrace \times \lbrace 1, \dots , m_{q,\widetilde{q}}+1 \rbrace \), and

$$\begin{aligned} {\widetilde{\epsilon }}&:= (4pt)^{-\frac{1}{2}}\{\exp (2tq)\tau _{\widetilde{q}+1}\}^{\frac{1}{2}},\\ \widetilde{L}_{1}&:= \frac{1}{2}(4pt)^{\frac{1}{2}}q\exp (2tq{\widetilde{\epsilon }})\{\exp (2tq)\tau _{\widetilde{q}+1}\}^{\frac{1}{2}},\\ \widetilde{L}_{2}&:= (4pt)^{\frac{1}{2}}q\{\exp (2tq)\tau _{\widetilde{q}+1}\}^{\frac{1}{2}},\\ \widetilde{A}_{i_r}&:= \left( {\begin{array}{c}1\\ i_r\end{array}}\right) (1 + \widetilde{L}_{1} + \widetilde{L}_{2})^{1 - i_r}(\widetilde{L}_{1})^{i_r}. \end{aligned}$$

Let \(\widetilde{q} := [2/t]\wedge [q/2]\). \(\lbrace {\widetilde{\mathbf {X}}}_j \rbrace _{j \in Q_{2k-1}^i}\) for \((i,k) \in \lbrace 1, \dots , 2^J\rbrace \times \lbrace 1,\dots , m_{q,\widetilde{q}} +1 \rbrace \) are mutually independent with mean \({\mathbf {0}}\) and \(2^J\sum _{k = 1}^{m_{\widetilde{q}, q} + 1}\mathop {\text {card}}(Q_{2k-1}^i) \le B\). Moreover, by Weyl’s inequality, for \((i,k) \in \lbrace 1, \dots , 2^J\rbrace \times \lbrace 1,\dots , m_{q,\widetilde{q}} +1 \rbrace \), we have

$$\begin{aligned} 2\lambda _{\max }\left( \sum _{j \in Q_{2k-1}^i} {\widetilde{\mathbf {X}}}_j\right) \le 2\widetilde{q} \le \frac{4}{t}. \end{aligned}$$

By Lemma 4 in [3], we obtain

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}{\widetilde{\mathbf {X}}}_j\right) \le p\exp \{4h(4)Bt^2\nu ^2\}, \end{aligned}$$
(A.7)
$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( -2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}{\widetilde{\mathbf {X}}}_j\right) \le p\exp \{4h(4)Bt^2\nu ^2\}. \end{aligned}$$
(A.8)

Plugging (A.7) and (A.8) into (A.6) and using the fact that \(\sum \limits _{i_r = 0}^{1} \widetilde{A}_{i_r} = 1 + 2\widetilde{L}_1 + \widetilde{L}_2\), we obtain

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( 2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \le (1 + 2\widetilde{L}_1 + \widetilde{L}_2)^{2^J} p\exp \{4h(4)Bt^2\nu ^2\}. \end{aligned}$$
(A.9)

By replacing \(\mathbf {X}\) by \(-\mathbf {X}\), we obtain

$$\begin{aligned}&{{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( -2t\sum _{i = 1}^{2^J}\sum _{k = 1}^{m_{q,\widetilde{q}}+1}\sum _{j \in Q_{2k-1}^i}\mathbf {X}_j\right) \le (1 + 2\widetilde{L}_1 + \widetilde{L}_2)^{2^J} p\exp \{4h(4)Bt^2\nu ^2\}. \end{aligned}$$
(A.10)

Combining (A.5) with (A.9) and (A.10), we get

$$\begin{aligned} {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{j \in K_B}\mathbf {X}_j\right) \le&\sum \limits _{i_1 = 0}^{2^0}\dots \sum \limits _{i_{J} = 0}^{2^{J-1}}\left[ \left( \prod \limits _{k = 1}^{J}A_{k,i_k}\right) (1 + 2\widetilde{L}_1 + \widetilde{L}_2)^{2^J} p\exp \{4h(4)Bt^2\nu ^2\}\right] \nonumber \\ =&\left\{ \prod \limits _{k = 1}^{J} (1 + 2L_{k,1} + L_{k,2})^{2^{k-1}}\right\} (1 + 2\widetilde{L}_1 + \widetilde{L}_2)^{2^J} p\exp \{4h(4)Bt^2\nu ^2\}, \end{aligned}$$
(A.11)

where the last equality is followed by \(\sum _{i_k = 1}^{2^{k-1}}A_{k,i_k} = (1 + 2L_{k,1} + L_{k,2})^{2^{k-1}}\).

By using \(\log (1 + x) \le x\) for \(x \ge 0\), we have

$$\begin{aligned}&\log {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{j \in K_B}\mathbf {X}_j\right) \le \sum \limits _{k = 1}^{J} 2^{k -1}(2L_{k,1} + L_{k,2}) +2^J(2\widetilde{L}_1 + \widetilde{L}_2) +\log [p\exp \{4h(4)Bt^2\nu ^2\}]. \end{aligned}$$
(A.12)

For simplicity, we denote \(I = \sum \limits _{k = 1}^{J} 2^{k -1}(2L_{k,1} + L_{k,2}),\ II = 2^J(2\widetilde{L}_1 + \widetilde{L}_2)\) in (A.12).

Step III Following calculations similar to [3], we obtain

$$\begin{aligned} I&\le \frac{32\sqrt{2}}{\log 2}\left[ 1 + \exp \left\{ \frac{1}{\sqrt{2p}}\exp \left( -\frac{\psi _2}{16t}\right) \right\} \right] \frac{t^2}{\psi _2}\exp \left( -\frac{\psi _2}{32t}\right) . \end{aligned}$$
(A.13)

and

$$\begin{aligned} II \le 128 \left[ 1 + \exp \left\{ \frac{1}{\sqrt{p}} \exp \left( -\frac{\psi _2}{32t}\right) \right\} \right] \frac{t^2}{\psi _2} \exp \left( -\frac{\psi _2}{64t}\right) . \end{aligned}$$
(A.14)

Hence, by combining (A.4), (A.12), (A.13) and (A.14), we obtain for \(0 < t \le \min \{1, \frac{\psi _2}{8\log (\psi _1B^6p)}\}\),

$$\begin{aligned}&\log {{\mathbb {E}}}\mathop {\text {Tr}}\exp \left( t\sum _{j \in K_B}\mathbf {X}_j\right) \\ \le&\log p + 4h(4)Bt^2\nu ^2 + 151\left[ 1 + \exp \left\{ \frac{1}{\sqrt{p}} \exp \left( -\frac{\psi _2}{64t}\right) \right\} \right] \frac{t^2}{\psi _2} \exp \left( -\frac{\psi _2}{64t}\right) . \end{aligned}$$

This completes the proof of Lemma A.4. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, F., Li, Y. Moment Bounds for Large Autocovariance Matrices Under Dependence. J Theor Probab 33, 1445–1492 (2020). https://doi.org/10.1007/s10959-019-00922-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10959-019-00922-z

Keywords

Mathematics Subject Classification (2010)

Navigation