1 Introduction

Estimation of covariance matrices in a high dimensional setting has been one of the fundamental statistical issues in the last decade. Some statistical applications of estimation of covariance matrices have been presented for ridge regression in Hoerl and Kennard (1970), in regularized discriminant analysis—(Friedman 1989) and in principal component analysis—(Johnstone and Lu 2009). For an overview of this topic and its applications see (Bickel and Levina 2008b; Birnbaum and Nadler 2012; Chen et al. 2013; Fan et al. 2006; Rothman et al. 2009). The problem of estimating covariance matrices for dependent data has been recently investigated by (Chen et al. 2013; Bhattacharjee and Bose 2014a, b; Guo et al. 2016; Jentsch and Politis 2015; McMurry and Politis 2010), and Wu and Pourahmadi (2009). The estimation of the inverse covariance matrix is used in the recovery of the true unknown structure of undirected graphical models, especially in Gaussian graphical models, where a zero entry of the inverse covariance matrix is associated with a missing edge between two vertices in the graph. The recovery of undirected graphs on the basis of the estimation of the precision matrices for a general class of nonstationary time series is considered in Xu et al. (2020).

Consider a p-dimensional linear process

$$\begin{aligned} {\mathbf {X}}_{t}=\sum _{j=0}^{\infty }{\varvec{\Psi }}_{j}{\varepsilon } _{t-j}\text { (almost surely),} \end{aligned}$$
(1)

where the \({\varvec{\Psi }}_{j}\) are \(p\times p\) matrices, \({ \varepsilon }_{i}=\left( \varepsilon _{i,1},\ldots ,\varepsilon _{i,p}\right) ^{^{\prime }}\), \(\left( {\varepsilon }_{t}\right) \) are i.i.d. vectors in \({\mathbb {R}}^{p}\) with mean \({\mathbf {0}}\) and variance-covariance matrix \(\varvec{\Sigma }\). Under a causality condition, a vector ARMA process which is a basic model in econometrics and finance is a linear process (Brockwell and Davis 2002). We assume that \(\left( {\varepsilon }_{t}\right) \) satisfies one of the following conditions:

(Gauss):

\({\varepsilon }_{i}\) is Gaussian with mean \({\mathbf {0}} \) and variance-covariance matrix \({\varvec{\Sigma }}\).

(SGauss):

\(\left( \varepsilon _{i,l}\varepsilon _{j,s}\right) \) is sub-Gaussian with constant \(\sigma ^{2}\), that is,

$$\begin{aligned} E\exp \left( u\varepsilon _{i,l}\varepsilon _{j,s}\right) \le \exp \left( \sigma ^{2}u^{2}/2\right) \end{aligned}$$

for all \(u\in {\mathbb {R}}\), \(i,j=1,2,...\) and \(l,s=1,\ldots ,p\),

(NGa\(_{\beta }\)):

\(E\left| \varepsilon _{i,j}\right| ^{\beta }<\infty \) for some \(\beta >2\) for \(i=1,2,...\) and \(j=1,...,p\).

For example, condition (SGauss) is satisfied for bounded sequences \(\left( \varepsilon _{i,l}\right) \) i.e. \(\sup _{i,l}\left| \varepsilon _{i,l}\right| \le M\) for some \(M>0\). We can observe that (SGauss) is implied by sub-Gaussian condition for the vectorization \(vec\left( {{ \varepsilon }}_{i}^{^{\prime }}\otimes {\varepsilon }_{j}\right) \) of the Kronecker product \({\varepsilon }_{i}^{^{\prime }}\otimes {\varepsilon }_{j}\) for all ij i.e. for all \(u\in {\mathbb {R}} ^{p^{2}}\)

$$\begin{aligned} E\exp ( u^{^{\prime }}vec({\varepsilon }_{i}^{^{\prime }}\otimes {\varepsilon }_{j})) \le \exp ( u^{^{\prime }}\sigma ^{2}u/2) \end{aligned}$$

for some \(\sigma ^{2}>0\). Condition (NGa\(_{\beta }\)) is a moment condition for the innovation process without assumptions on the dependency of the coordinates of the innovation process \(\left( {\varepsilon }_{i}\right) \).

Let \({\varvec{\Gamma }}_{k}\) be the kth order autocovariance matrix,

$$\begin{aligned} {\varvec{\Gamma }}_{k}={{\,\mathrm{Cov}\,}}({\mathbf {X}}_{t},{\mathbf {X}}_{t-k}^{^{\prime }})=\sum _{j=k}^{\infty }{\varvec{\Psi }}_{j}{\varvec{\Sigma \Psi }} _{j-k}^{^{\prime }}\text {.} \end{aligned}$$
(2)

The matrix \({\varvec{\Gamma }}_{k}\) will be estimated from the sample \({\mathbf {X}}_{1},...,{\mathbf {X}}_{n}\) because in practice we do not know the matrices \( {\varvec{\Psi }}_{j}\) and \({\varvec{\Sigma }}\). Define the sample autocovariance matrix of order k as

$$\begin{aligned} \varvec{\hat{\Gamma }}_{k}=\frac{1}{n-k}\sum _{t=k+1}^{n}\mathbf {X}_{t}\mathbf { X}_{t-k}^{{\prime }}=:(\hat{\gamma }_{ij}^{k}) \end{aligned}$$

for \(0\le k\le n-1\), and the banded version of \({\varvec{{\hat{\Gamma }}}}_{k}\) (as in Bhattacharjee and Bose (2014b)) is given by

$$\begin{aligned} \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})=(\hat{\gamma }_{ij}^{k}\mathbf {1 }\left( \left| i-j\right| \le l_{n}\right) ) \end{aligned}$$
(3)

for some sequence of thresholds \(l_{n}\rightarrow \infty \) as \(n\rightarrow \infty \), where \({\mathbf {1}}(\cdot )\) is the indicator function. We will assume that \(p=p(n)\rightarrow \infty \) as \(n\rightarrow \infty \).

The main contribution of our paper is that in Theorem 1 we obtain the rate in the operator norm of \({\mathbf {B}}_{l_{n}}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }}_{k}\) for high dimensional linear process

$$\begin{aligned} \big \Vert {\mathbf {B}}_{l_{n}}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }} _{k}\big \Vert _{2}={\mathcal {O}}_{P}( \Vert {\varvec{\Sigma }} \Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \text {,} \end{aligned}$$

for some \(c_{n}\rightarrow 0\) and some \(\alpha >0\). Under the sub-Gaussian condition ((Gauss) and (SGauss)) we obtain the same rate \({\mathcal {O}}_{P }( (n^{-1} \log p)^{\alpha /2(\alpha +1)}) \) as Bhattacharjee and Bose (2014b) but under weaker assumptions for matrices of coefficients \( {\varvec{\Psi }}_{j} \) and \( {\varvec{\Sigma }} \). In particular, under causality, our results include vector autoregressive AR(r) processes. Similar results (Corollary 1) we obtain for the precision matrix

$$\begin{aligned} \big \Vert {\mathbf {B}}_{l_{n}}^{-1}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }}_{k}^{-1} \big \Vert _{2}={\mathcal {O}}_{P}(\Vert {\varvec{ \Sigma }}\Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \text {.} \end{aligned}$$

An interesting problem is to obtain lower bounds and the optimal rate of convergence. Cai et al. (2010) obtained the minimax bound for i.i.d. observations for tapering estimators of \({\varvec{\Gamma }}_{0}\). For dependent data this problem is still open. Below we briefly present the state of research related to the estimation of the covariance matrix for independent and dependent observations.

The sample covariance matrix \(\varvec{\hat{\Gamma }}_{0}=(\hat{\gamma } _{ij}^{0})\) performs poorly in a high dimensional setting. In the Gaussian case when \({\varvec{\Gamma }}_{\mathbf {0}}={\mathbf {I}}\) is the identity matrix and \(p/n\rightarrow c \in (0,1)\), the empirical distribution of the eigenvalues of the sample covariance matrix \({\varvec{\hat{\Gamma }}}_{0}\) follows the Marchenko and Pastur (1967) law, which is supported on the interval \((( 1-\sqrt{c})^{2}, (1+\sqrt{c})^{2})\).

For the i.i.d. case Bickel and Levina (2008a) proposed thresholding of the sample covariance matrix and obtained rates of convergence for the thresholding estimators for a proper choice of the threshold \(\lambda _{n}\), where the estimator is given by

$$\begin{aligned} {\hat{\gamma }}_{ij}^{\lambda }={\hat{\gamma }}_{ij}^{0}{\mathbf {1}}({\hat{\gamma }} _{ij}^{0}\ge \lambda _{n})\text {.} \end{aligned}$$

Rothman et al. (2009) considered a class of universal thresholding rules with more general thresholding functions than hard thresholding. An interesting generalization of this method can be found in Cai and Liu (2011) for sparse covariance matrices, where an adaptive thresholding estimator is given by

$$\begin{aligned} {\hat{\gamma }}_{ij}^{*}=S_{\lambda _{ij}}({\hat{\gamma }}_{ij}^{0})\text {,} \end{aligned}$$

where \(S_{\lambda _{ij}}(\cdot )\) is a general thresholding function with data-driven thresholds \(\lambda _{ij}\). For other interesting results in this area, see (Birnbaum and Nadler 2012; Cai et al. 2010; Fan et al. 2006; Furrer and Bengtsson 2007; Huang et al. 2006).

There are few results for high-dimensional dependent data: (Bhattacharjee and Bose 2014a, b; Chen et al. 2013; Jentsch and Politis 2015) and Guo et al. (2016). Bhattacharjee and Bose (2014a) considered the estimation of the high dimensional variance-covariance matrix under a general Gaussian model with weak dependence in both rows and columns of the data matrix. They showed that the bounded and tapered sample variance-covariance matrices are consistent under a suitable column dependence model. But their conditions do not allow control of the first few autocovariances, only they control higher order autocovariances. Bhattacharjee and Bose (2014b) showed that under suitable assumptions for the linear process, the banded sample autocovariance matrices are consistent in the high dimensional setting. Chen et al. (2013) obtained the rate of convergence for a banded autocovariance estimator in operator norm for a general dependent model. A similar result under more restricted assumptions was obtained by (Jentsch and Politis 2015; Guo et al. 2016) established similar results for multivariate sparse autoregressive AR processes.

The rest of the paper is organized as follows. In “The rate of convergence of autocovariance estimation” section we deal with the problem of estimating the kth order autocovariance matrix \({\varvec{\Gamma }}_{k}\) by (3) in high dimensions (Theorem 1). The rate of convergence for the estimator of the precision matrix \({\varvec{\Gamma }}_{k}^{-1}\) is given in Corollary 1.

In Proposition 1 we obtain bounds on the error probability of the estimation of the k th order autocovariance. In “Comparison of our results with previous studies” section we compare our results with the results obtained by (Bickel and Levina 2008a, b) for independent normal and nonnormal data, with the minimax upper bound for tapering estimator in Cai et al. (2010) and with the results for dependent data obtained by (Chen et al. 2013; Bhattacharjee and Bose 2014b; Guo et al. 2016) and Jentsch and Politis (2015). In the special case of a multi-dimensional linear process we obtain a sharper bound (Theorem 1) than Chen et al. (2013) and Jentsch and Politis (2015). We also obtain a better rate for the estimation error for multivariate sparse AR processes than Guo et al. (2016).

Finally, the conclusions are presented in Sect. 2.5. All the proofs and auxiliary lemmas are given in the “Appendix”.

2 The rate of convergence of autocovariance estimation

For any \(p\times p\) matrix \({\mathbf {M=}}\left( m_{ij}\right) \) we define the following matrix norms which are convenient for comparison with other results of autocovariance estimation:

$$\begin{aligned} \left\| {\mathbf {M}}\right\| _{2}=\sqrt{\lambda _{\max }\left( {\mathbf {M}} ^{^{\prime }}{\mathbf {M}}\right) }\text { (operator norm),} \end{aligned}$$

where \(\lambda _{\max }\left( {\mathbf {M}}^{^{\prime }}{\mathbf {M}}\right) \) is the maximum eigenvalue of the matrix \({\mathbf {M}}^{^{\prime }}{\mathbf {M}}\),

$$\begin{aligned} T\left( {\mathbf {M}},t\right) =\max _{1\le j\le p}\sum _{i:\left| i-j\right| >t}\left| m_{ij}\right| \end{aligned}$$

for some threshold \(t>0\),

$$\begin{aligned} \left\| {\mathbf {M}}\right\| _{(1,1)}= & {} \max _{1\le j\le p}\sum _{i=1}^{p}\left| m_{ij}\right| \text {,} \\ \left\| {\mathbf {M}}\right\| _{\infty }= & {} \max _{1\le i,j\le p}\left| m_{ij}\right| \text {.} \end{aligned}$$

We consider the following conditions on the matrices \(\varvec{\Psi }_{j}\) (see (1)), \( {\varvec{\Gamma }}_{k}\) and \(\varvec{\Sigma }\) for all \(t>0\) and \(j=0,1,...:\)

  1. (A1)

    there exists a sequence \(d_{j}\) with \(\sum _{j=1}^{\infty }d_{j}^{2}<\infty \) such that

    $$\begin{aligned} \max (T\left( \varvec{\Psi }_{j},t\right) ,T(\varvec{\Psi }_{j}^{^{\prime }},t))\le C_{1}d_{j}t^{-\alpha } \end{aligned}$$

    for some constants \(C_{1}>0\), \(\alpha >0\) ,

  2. (A2)

    \(T\left( \varvec{\Sigma },t\right) \le C_{2}t^{-\alpha }\) for some constants \(C_{2}>0\), \(\alpha >0\),

  3. (A3)

    \(\sum _{j=1}^{\infty }r_{j}^{2}<\infty \), where \(r_{j}=\max (\left\| \varvec{\Psi }_{j}\right\| _{(1,1)},\Vert {\varvec{\Psi } }_{\mathbf {j}^{^{\prime }}}\Vert _{(1,1)})\),

  4. (A4)

    \(\sum _{j=1}^{\infty }r_{j}<\infty \), where \(r_{j}=\max (\left\| \varvec{\Psi }_{j}\right\| _{(1,1)},\Vert {\varvec{\Psi } }_{\mathbf {j}^{^{\prime }}}\Vert _{(1,1)})\),

  5. (A5)

    \(\lambda _{\max }\left( \varvec{\Sigma }\right) \le C_{3}\) for some constant \(C_{3}>0\).

These conditions are restrictions on the parameter space. It is obvious that (A3) implies (A4) but the converse is not true. If the covariance matrix \( {\varvec{\Sigma =}}( \gamma _{ij}) \) is such that \(\left| \gamma _{ij}\right| \le C\left| i-j\right| ^{-\alpha -1}\) for all ij for some \(\alpha >0\), then \(T\left( {\varvec{\Sigma }},t\right) \le C_{2}t^{-\alpha }\) and (A2) holds. Conditions (A1)-(A2) are tapering conditions and specify the rate of decay for the matrices \({\varvec{\Psi }}_{j}\) and \({\varvec{\Sigma }}\) away from the diagonal.

Of course, if (A3) holds then \(\left\| {{\varvec{\Gamma }}}_{k}\right\| _{(1,1)}<\infty \) (see (2)), and \(\sum _{j=0}^{\infty }\left\| {\varvec{\Psi }}_{j}\right\| _{(1,1)}^{2}<\infty \) implies that the series in (1) converges almost surely.

Next, in “Remarks on the condition (A1) for AR(1) processes” and “Remarks on the condition (A1) for AR(r) processes” sections in a few remarks we will give a discussion about condition (A1) is fulfilled for vector AR(1) and AR(r) processes. In “Comparison of our results with previous studies” section we compare our main result Theorem 1 with the previous studies. In “Conclusions” section we summarize results related to Theorem 1 available in the literature.

2.1 The main results

The main results in this section concern the rate of convergence in operator norm for the kth order autocovariance matrix \({\varvec{\Gamma }}_{k}\) and for the precision matrix \({\varvec{\Gamma }}_{k}^{-1}\).

Theorem 1

Suppose (A1)-(A2) hold. Then

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}( \Vert \varvec{\Sigma } \Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \end{aligned}$$
(4)

for \(l_{n}=( c_{n}) ^{-1/(\alpha +1)}\). Here \(c_{n}=\sqrt{n^{-1} \log p}\) when (((Gauss) and (A5)) or (SGauss)) and (A3) hold, and \( p={\mathcal {O}}( n^{\gamma /2}) \) for some \(\gamma >1\) as \( n\rightarrow \infty \); and \(c_{n}=p^{2/\beta }\sqrt{n^{-1} \log p}\) when (NGa\(_{\beta }\)) and (A4) hold, and \(p^{2/\beta }\sqrt{n^{-1} \log p} \rightarrow 0\) as \(n\rightarrow \infty \).

The proof of Theorem 1 is similar to the proofs of Theorems 1-2 in Bhattacharjee and Bose (2014b), but instead of their Lemmas 3, 5 we use a maximal inequality for the (SGauss) case and Pisier’s inequality (see (Pisier 1983)) for the (NGa\(_{\beta }\)) case.

Corollary 1

Under the assumptions of Theorem 1 and when \(\varvec{ \Gamma }_{k}^{-1}\) exists and \(\Vert {\varvec{\Gamma }} _{k}^{-1}\Vert _{2}={\mathcal {O}}(1) \), we have

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}^{-1}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}^{-1}\big \Vert _{2}=\mathcal {O}_{P}( \Vert \varvec{\Sigma } \Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \text {,} \end{aligned}$$
(5)

where \(l_{n}\) and \(c_{n}\) are defined in Theorem 1.

Corollary 1 is a simple consequence of Theorem 1. We also obtain the rate of convergence of \({\varvec{{\hat{\Gamma }}}}_{k}\) in supremum norm, which will be used in the proof of Theorem 1.

Lemma 1

We have

$$\begin{aligned} \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }}_{k}\big \Vert _{\infty }={\mathcal {O}}_{P}\left( c_{n}\right) \text {,} \end{aligned}$$
(6)

where \(c_{n}=\sqrt{n^{-1} \log p}\) if (SGauss) and (A3) hold and \(p= {\mathcal {O}}\left( n^{\gamma /2}\right) \) for some \(\gamma >1\) as \( n\rightarrow \infty \), and \(c_{n}=p^{1/\beta }\sqrt{n^{-1} \log p}\) when (NGa\(_{\beta }\)) and (A4) hold and \(p^{2/\beta }\sqrt{n^{-1} \log p} \rightarrow 0\) as \(n\rightarrow \infty \).

Next, we obtain non-asymptotic bounds on the error probability of the estimation of the kth order autocovariance.

Proposition 1

Suppose (A1)-(A2) hold. Then for any \(\eta >\tilde{C} _{1}\left\| {\varvec{\Sigma }}\right\| _{(1,1)}+{\tilde{C}}_{2}\),

$$\begin{aligned}&P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})- \varvec{\Gamma }_{k}\big \Vert _{2}>\eta \Big ) \nonumber \\&\quad \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( -\frac{(n-k)^{2}(\eta -(\tilde{C} _{1}\left\| \varvec{\Sigma }\right\| _{(1,1)}+\tilde{C} _{2})l_{n}^{-\alpha })^{2}}{2\sigma ^{2}\left( 2l_{n}+1\right) ^{2}r_{i}^{2}r_{j}^{2}} \Big ) \end{aligned}$$
(7)

for some constants \({\tilde{C}}_{1}\), \({\tilde{C}}_{2}\) where (SGauss) and (A3) hold, and

$$\begin{aligned} P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{ \Gamma }_{k}\big \Vert _{2}>\eta \Big ) \le C_{\beta }C_{*}\frac{p^{2/\beta }n^{1/2}\left( 2l_{n}+1\right) }{(n-k)(\eta -(\tilde{C}_{1}\left\| \varvec{\Sigma }\right\| _{(1,1)}+\tilde{C}_{2})l_{n}^{-\alpha })} \end{aligned}$$
(8)

for some constants \({\tilde{C}}_{1}\), \({\tilde{C}}_{2}\), \(C_{*}=\left( \sum _{i=0}^{\infty }r_{i}\right) ^{2}\) and \(C_{\beta }\) depending on \(\beta \) when (NGa\(_{\beta }\)) and (A4) hold.

2.2 Remarks on the condition (A1) for AR(1) processes

As a special case, we consider a multivariate AR(1) process

$$\begin{aligned} \begin{aligned} {\mathbf {X}}_{t}=\sum _{j=0}^{\infty }{\mathbf {A}}^{j}{{\varepsilon }}_{t-j} \end{aligned} \end{aligned}$$
(9)

for some \(p\times p\) matrix \({\mathbf {A}}\). Then \(\varvec{\Psi }_{j}={\mathbf {A}} ^{j}\) for \(j=1,2,\dots \). Let \(t>0\). We impose one of two conditions on the matrix \({\mathbf {A}}\):

  1. (w1)

    there exists \(\alpha >0\), a constant \(C>0\), and a sequence \( b_{j} \) such that \(\sum _{j=1}^{\infty }jb_{j}\max (\left\| \mathbf {A} \right\| _{(1,1)}^{j-1},\Vert \mathbf {A}^{^{\prime }}\Vert _{(1,1)}^{j-1})<\infty \) and \(\max (\left\| \mathbf {A}\right\| _{(1,1)},\Vert \mathbf {A} ^{^{\prime }}\Vert _{(1,1)})<1\) and \(\max (T(\mathbf {A},t/ 2^{j-1}),T(\mathbf {A}^{^{\prime }},t/2^{j-1}))\le Cb_{j}t^{-\alpha }\).

  2. (w2)

    \(\max (\left\| \mathbf {A}\right\| _{(1,1)},\Vert \mathbf {A}^{^{\prime }}\Vert _{(1,1)})<2^{-\alpha }\) and \(\max (T( \mathbf {A},t/2^{j-1}),T(\mathbf {A}^{^{\prime }},t/2^{j-1} ))\le Ct^{-\alpha }\) for some \(\alpha >0\) and some constant \(C>0\).

Remark A

Condition (A1) is fulfilled for \(d_{j}=jb_{j}\max (\left\| {\mathbf {A}}\right\| _{(1,1)}^{j-1},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j-1})\) when (w1) holds, because from Lemma 5 (see “Appendix”)

$$\begin{aligned} \begin{aligned} \max (T ( \varvec{\Psi }_{j},t) ,T(\varvec{\Psi } _{j}^{^{\prime }} ,t))=&{} \max (T ( {\mathbf {A}} ^{j},t) ,T(({\mathbf {A}} ^{^{\prime }} ) ^{j} ,t)) \\\le&{} j\max (\Vert {\mathbf {A}}\Vert _{(1,1)}^{j-1}T({\mathbf {A}},t/2^{j-1}),\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j-1}T({\mathbf {A}}^{^{\prime }},t/2^{j-1})). \end{aligned} \end{aligned}$$

Similarly, condition (A1) is fulfilled for \(d_{j}=j2^{(j-1)\alpha }\max (\Vert {\mathbf {A}}\Vert _{(1,1)}^{j-1},\Vert {\mathbf {A}} ^{^{\prime }}\Vert _{(1,1)}^{j-1})\) when (w2) holds. Condition (A3) holds when \(\max (\left\| {\mathbf {A}}\right\| _{(1,1)},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)})<1\), because

$$\begin{aligned} \sum _{j=1}^{\infty }r_{j}=\sum _{j=1}^{\infty }\max (\Vert \varvec{\Psi } _{j}\Vert _{(1,1)},\Vert \varvec{\Psi }_{j}^{^{\prime }}\Vert _{(1,1)})\le \sum _{j=1}^{\infty }\max (\Vert {\mathbf {A}} \Vert _{(1,1)}^{j},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j})\text {.} \end{aligned}$$

2.3 Remarks on the condition (A1) for AR(r) processes

Now, we consider a multivariate AR(r) process of order r given by

$$\begin{aligned} \begin{aligned} {{\mathbf {X}}}_{{t}}=\sum _{i=1}^{r}{\mathbf {A}}_{i}{\mathbf {X}}_{t-i}+{ \varepsilon }_{t}\text{, } \end{aligned} \end{aligned}$$

\(t\ge 1\), where the \(p\times p\) matrices \({\mathbf {A}}_{i}\), \( i=1,\ldots ,r\), are called parameter matrices. Under some regularity condition (for more details see ((Bhattacharjee and Bose 2014b) (4), p. 264) and Brockwell and Davis (2002)) this process has the representation ( 1), where \({\varvec{\Psi }}_{\mathbf {0}}={\mathbf {I}}\) and

$$\begin{aligned} \begin{aligned} {\varvec{\Psi }}_{{j}}=\sum _{i=1}^{\min (j,r)}{\mathbf {A}}_{i}\varvec{\Psi }_{j-i} \text{ for } j=1,2,\dots \end{aligned} \end{aligned}$$
(10)

We consider the class \({\mathcal {A}}\) of r-sequences of matrices, defined by

$$\begin{aligned} \begin{aligned} \mathcal {A=}\left\{ \begin{array}{c} \max (\Vert {{\mathbf {A}}}_{i}\Vert _{(1,1)},\Vert {\mathbf {A}} _{i}^{^{\prime }}\Vert _{(1,1)})\le C_{1}\delta ^{i} \text{ and } \text{ for } \text{ all } t>0\text{, } \\ \max (T\left( {{\mathbf {A}}}_{i},t\right) ,T({\mathbf {A}}_{i}^{^{\prime }},t))\le C_{2}\delta ^{i}t^{-\alpha } \text{ for } \text{ some } 0<\delta <1\text{, } \\ \alpha>0\text{, } C_{1}=1/\left( 2^{\alpha }r\right) \text{, } i=1,2,\ldots ,r \text{ for } \text{ some } \text{ constant } C_{2}>0\text{. } \end{array} \right. \end{aligned} \end{aligned}$$

Observe that if \(T\left( {\mathbf {A}}_{i},t\right) \le C_{2}\delta ^{i}t^{-\alpha }\) for some \(0<\delta <1\) and

$$\begin{aligned} \max _{1\le k\le p}\left| a_{i}^{k,k}\right| \le C\delta ^{i}, \end{aligned}$$

where \({\mathbf {A}}_{i}=\left( a_{i}^{s,k}\right) \), then \(\left\| {\mathbf {A}}_{i}\right\| _{(1,1)}\le \left( 2^{\alpha }C_{2}+C\right) \delta ^{i}\). Indeed, for any \( t>0\),

$$\begin{aligned} \left\| \mathbf {A}_{i}\right\| _{(1,1)}\le & {} T(\mathbf {A} _{i},t)+\max _{1\le k\le p}\sum _{s:\left| s-k\right| \le t}\left| a_{i}^{s,k}\right| \\\le & {} C_{2}\delta ^{i}t^{-\alpha }+\max _{1\le k\le p}\sum _{s:\left| s-k\right| \le t}\left| a_{i}^{s,k}\right| \text {.} \end{aligned}$$

Therefore, putting \(t=1/2\), we get

$$\begin{aligned} \left\| {\mathbf {A}}_{i}\right\| _{(1,1)}\le C_{2}2^{\alpha }\delta ^{i}+\max _{1\le k\le p}\left| a_{i}^{k,k}\right| \le \left( 2^{\alpha }C_{2}+C\right) \delta ^{i}\text {.} \end{aligned}$$

Similarly, we may obtain \(\Vert {\mathbf {A}} _{i}^{^{\prime }} \Vert _{(1,1)}\le ( 2^{\alpha }C_{2}+C) \delta ^{i}\).

Proposition 2

Under the condition of class \({\mathcal {A}}\), we have

$$\begin{aligned} \max (\Vert \varvec{\Psi }_{j}\Vert _{(1,1)},\Vert \varvec{ \Psi }_{j}^{^{\prime }}\Vert _{(1,1)})\le C_{1}\delta ^{j} \end{aligned}$$
(11)

and

$$\begin{aligned} \max (T\left( \varvec{\Psi }_{j},t\right) ,T(\varvec{\Psi }_{j}^{^{\prime }},t))\le C_{2}j\delta ^{j}t^{-\alpha } \end{aligned}$$
(12)

for \(j=1,2,\ldots \)

Remark B

Immediately from Proposition 2, we have that under the condition of class \({\mathcal {A}}\) condition (A1) holds for \( d_{j}=j\delta ^{j}\) for multivariate AR(r) processes.

2.4 Comparison of our results with previous studies

Similar results to those given in Theorem 1 have been presented in Bickel and Levina (2008b) for i.i.d. Gaussian observations \({{\mathbf {X}}}_{1},...,{\mathbf {X}}_{n}\) and in Bhattacharjee and Bose (2014b) for a p-dimensional linear process with \(n^{-1} \log p\rightarrow 0\). In Bhattacharjee and Bose (2014b) it is assumed that the matrix \(\varvec{ \Sigma }\) belongs to the class

$$\begin{aligned} \mathcal {U=}\big \{ \varvec{\Sigma :}\text { }0<\varepsilon<\lambda _{\min }\left( \varvec{\Sigma }\right) \le \lambda _{\max }\left( \varvec{\Sigma } \right)<1/\varepsilon ,T\left( \varvec{\Sigma ,}t\right) <Ct^{-\alpha } \text { for all }t>0\big \} \text {,} \end{aligned}$$

where \(\varepsilon ,\alpha ,C>0\) and \(\lambda _{\min }\left( \varvec{\Sigma } \right) \), \(\lambda _{\max }\left( \varvec{\Sigma }\right) \) are respectively the minimum and maximum eigenvalues of \(\varvec{\Sigma }\), the coefficient matrices \(\left( \varvec{\Psi }_{j}\right) \) are in \({\mathcal {T}} _{\beta ,\lambda }\cap {\mathcal {G}}_{\alpha ,\eta ,\nu }\) for some \(0<\beta <1 \), \(\lambda \ge 0\), \(\alpha \), \(\nu >0\), \(0<\eta <1\), where

$$\begin{aligned} {\mathcal {T}}_{\beta ,\lambda }=\big \{\big ( \varvec{\Psi }_{j}\big ) :\sum _{j=0}^{\infty }r_{j}^{\beta }<\infty ,\sum _{j=0}^{\infty }r_{j}^{2( 1-\beta ) }j^{\lambda }<\infty \big \}\text {,} \end{aligned}$$
$$\begin{aligned} {\mathcal {G}}_{\alpha ,\eta ,\nu }= & {} \Big \{\left( \varvec{\Psi }_{j}\right) :T( \varvec{\Psi }_{j},t\sum _{u=0}^{j}\eta ^{u})<Ct^{-\alpha }r_{j}j^{v}\sum _{u=0}^{j}\eta ^{-u\alpha },\text { and} \\&\sum _{j=k}^{\infty } \frac{r_{j}r_{j-k}j^{\nu }}{\eta ^{\alpha j}}<\infty \Big \}\text {,} \end{aligned}$$

with \(r_{j}=\max (\Vert \varvec{\Psi }_{j}\Vert _{(1,1)},\Vert {\varvec{\Psi }}_{j}{^{^{\prime }}}\Vert _{(1,1)})\). Additionally they assumed that for some \(\lambda _{0}>0\),

$$\begin{aligned} \sup _{j\ge 1}E\left( e^{\lambda \varepsilon _{1,j}}\right)<\infty \text { for all }\left| \lambda \right| <\lambda _{0}\text {.} \end{aligned}$$
(13)

In our Theorem 1 we consider (Gauss), (SGauss) or (NGa\(_{\beta }\)) for \(\left( {\varepsilon }_{i}\right) \). In particular, condition (NGa \(_{\beta }\)) is much weaker than (13). The class \({\mathcal {U}}\) implies our conditions (A2) and (A5). Also conditions (A1) and (A3) are weaker than \({\mathcal {T}}_{\beta ,\lambda }\cap {\mathcal {G}}_{\alpha ,\eta ,\nu }\). For example if \(r_{j}\sim j^{-a}\) for \(a>1\), then (A4) holds but the conditions in \({\mathcal {T}}_{\beta ,\lambda }\) are satisfied for \(a>\max ( 1/\beta ,(\lambda +1)/2( 1-\beta ))\), and those in \({\mathcal {G}}_{\alpha ,\eta ,\nu }\) are satisfied for \(r_{j}\sim b^{j}\) for some \(0<b<1\).

Bickel and Levina (2008a) considered asymptotic behavior of the threshold estimator of the covariance matrix \({\varvec{\Gamma }}_{0}\) of the form

$$\begin{aligned} \begin{aligned}T_{u}(\varvec{\hat{\Gamma }}_{0})=(\hat{\gamma }_{ij}^{0}\mathbf {1}(\left| \hat{\gamma }_{ij}^{0}\right| \ge u))\text{. } \end{aligned} \end{aligned}$$
(14)

For Gaussian data, where \({\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) \) and

$$\begin{aligned} {\mathcal {G}}_{r}( M) :=\big \{{\varvec{\Gamma }}_{0}=( \gamma _{ij}) :\text { }\max _{1\le i\le p}\gamma _{ii}\le 1\text {, } \max _{1\le i\le p}\sum _{j=1}^{p}\vert \gamma _{ij}\vert ^{r}\le M\big \},0\le r<1, \end{aligned}$$
(15)

they obtained

$$\begin{aligned} \big \Vert T_{u}({\varvec{{\hat{\Gamma }}}}_{0})-{\varvec{\Gamma }}_{0}\big \Vert _{2}={\mathcal {O}}_{P}(Mu^{1-r}) \end{aligned}$$
(16)

where \(u=C(\log p)^{1/2}n^{-1/2}\) for a sufficiently large constant C. For nonnormal data where \({\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) \), they obtained (16) for \(u=Cp^{2/q}n^{-1/2}\), for a sufficiently large constant C (see (Chen et al. 2013), (48)).

Cai et al. (2010) obtained the minimax upper bound for a special class of tapering estimators of \({\varvec{\Gamma }}_{0}\) for i.i.d. observations under (A2) and (A5). Its upper rate bound equals \({\mathcal {O}}_{ P}\big ( \min \left\{ n^{-2\alpha /(2\alpha +1)}+n^{-1} \log p,p/n\right\} \big )\) and is better than our result for (Gauss) and (SGauss) from Theorem 1. It means that for i.i.d. observations our result is suboptimal.

A sharper rate than in (16) for a nonnormal case was obtained by Chen et al. (2013), where data come from a general weak dependence model. In particular, when data come from a linear process (1) with (NGa\( _{\beta }\)) for \(\beta =2q\) for some \(q>2\), and the coefficient matrices \( \varvec{\Psi }_{j}=\left( \psi _{k,s}\left( j\right) \right) _{1\le k,s\le p}\) satisfy

$$\begin{aligned} \max _{1\le k\le p}\sum _{s=1}^{p}\left( \psi _{k,s}\left( j\right) \right) ^{2}={\mathcal {O}}( j^{-\left( 2+2\gamma \right) }) \end{aligned}$$

for some \(\gamma >1/2-1/q\), \({\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) \) and \(M<p\), then one can obtain (16) for \(u=u_{*}\), where \(u_{*}=\max \left( u_{1},u_{2},u_{3}\right) \), \(u_{1}=M^{\frac{1}{ q+r}}p^{\frac{1}{q+r}}n^{\frac{1-q}{q+r}}\), \(u_{2}=\sqrt{n^{-1} \log p}\), \(u_{3}=M^{-\frac{2}{q-2r}}p^{\frac{2}{q-2r}}n^{\frac{(1-q)}{q-2r}}\)(for more details see discussion preceding Corollary 2.7 in Chen et al. (2013)).

Under condition (Gauss) or (SGauss), we obtain the same rate of convergence as in Bickel and Levina (2008a) for the covariance matrix \(\varvec{ \Gamma }_{0}\) for normal data and as in Bhattacharjee and Bose (2014b) for \({\varvec{\Gamma }}_{k}\). In particular, if \({{\varvec{\Gamma }}}_{\mathbf {0}}\in {\mathcal {G}}_{r}\left( M\right) \) then \(\left\| \varvec{\Sigma }\right\| _{(1,1)}\le M\), and from Theorem 1 we have

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{0})-\varvec{\Gamma } _{0}\big \Vert _{2}=\mathcal {O}_{P}( M(n^{-1} \log p ) ^{\frac{\alpha }{2(\alpha +1)}}) \text {.} \end{aligned}$$
(17)

If \(M\sim p^{\eta }\) for some \(\eta \in [0,1)\), then the r.h.s. of (16) for \(u=u_{*}\) equals

$$\begin{aligned} \max \big ( p^{\frac{(1-r)(2-2\eta )}{q+r}+\eta }n^{\frac{(1-q)(1-r)}{q+r} },p^{\eta }(n^{-1} \log p) ^{\frac{1-r}{2}},p^{\eta +\frac{ (1-r)(2-2\eta )}{q-2r}}n^{\frac{(1-q)(1-r)}{q-2r}}\big ) \text {,} \end{aligned}$$
(18)

which is greater than the r.h.s. of (17), \(p^{\eta }\left( n^{-1} \log p\right) ^{\frac{\alpha }{2(\alpha +1)}}\). Thus, our rate of convergence is better than the rate (16) for \(u=u_{*}\) in Chen et al. (2013) for a linear process.

Guo et al. (2016) worked with a bounded covariance estimator for \( l_{n}=C\log (n/\log p)\) for some \(C>0\) for a sparse multivariate AR model. The rate of convergence for operator norm was \({\mathcal {O}}_{P}(\log ( n/\log p)\sqrt{n^{-1} \log p})\). In a special case, from our Theorem 1 we obtain the better rate \({\mathcal {O}}_{P}(\log ^{\alpha }(n^{-1} \log p))\).

Jentsch and Politis (2015) dealt with so called flat-top tapered covariance matrix estimation for a multivariate strictly stationary time series process. In the special case when the observations come from a \(\ p\) dimensional linear process \({\mathbf {X}}_{t}=\sum _{j=-\infty }^{\infty } \varvec{\Psi }_{j}{\varepsilon }_{t-j}\), where the sequence \(\left( \varvec{\Psi }_{j}\right) \) of coefficient matrices is component-wise absolutely summable (then \(\sum _{h=-\infty }^{\infty }\sum _{i,j=1}^{p}\left| \gamma _{i,j}(h)\right| <\infty \), where \( {\varvec{\Gamma }}_{k}=\left( \gamma _{i,j}\left( h\right) \right) \)) with i.i.d. noise \(\left( {\varepsilon }_{t}\right) \) with fourth moments, they obtained

$$\begin{aligned} \Big ( E\big \Vert {\varvec{{\hat{\Gamma }}}}_{k,l}-{\varvec{\Gamma }} _{k}\big \Vert _{2}^{2}\Big ) ^{1/2}={\mathcal {O}}\Big ( \frac{lp^{2}}{\sqrt{n}} +\sum _{h=l+1}^{n-1}\sum _{i,j=1}^{p}\big \vert \gamma _{i,j}(h)\big \vert \Big ) \text {,} \end{aligned}$$
(19)

where \({\varvec{{\hat{\Gamma }}}}_{k,l}\) is a flat-top tapered covariance matrix estimator of \({\varvec{\Gamma }}_{k}\), where \(l=o\left( \sqrt{n}\right) \) is the banding parameter. If \(\psi _{i,j}(h)\le Ch^{-(1+\alpha )}\) for some \( C>0\) and \(\alpha >0\), where \(\varvec{\Psi }_{h}=\left( \psi _{i,j}(h)\right) \), then (A1) holds and under (A2) and (SGauss) we deduce from Theorem 1 that the r.h.s. of (4) is of order \({\mathcal {O}}_{P} ((n^{-1}\log p)^{\alpha /2(\alpha +1)})\). Therefore \( \left| \gamma _{i,j}(h)\right| ={\mathcal {O}}\left( h^{-\alpha -1}\right) \) and for \(l=n^{1/2-\kappa }\) for some \(\kappa \in (0,1/2)\) we have \(\sum _{h=l+1}^{n-1}\sum _{i,j=1}^{p}\left| \gamma _{i,j}(h)\right| ={\mathcal {O}}\left( l^{-\alpha }\right) \), so the r.h.s. of (19) is of order \({\mathcal {O}}\left( p^{2}n^{-\kappa }+l^{-\alpha }\right) =\mathcal {O(}p^{2}n^{-\kappa }+n^{-\left( \frac{1}{2}-\kappa \right) \alpha })\), which is worse rate than in Theorem 1. Similarly when (NGa\(_{\beta }\)) and \(p={\mathcal {O}}\left( n^{\gamma /2}\right) \) for some \(\gamma >1\), then from Theorem 1 the r.h.s. of (4) is of order \({\mathcal {O}}_{P}( ( p^{2/\beta }\sqrt{ n^{-1} \log p}) ^{\alpha /(\alpha +1)}) \) and this rate is sharper than the r.h.s. of (19).

2.5 Conclusions

Our main result (Theorem 1) was compared with related results available in the literature. In the special case of a multidimensional linear process we obtained a better rate of convergence of our covariance estimator in operator norm than (Chen et al. 2013; Jentsch and Politis 2015) and Guo et al. (2016). Our result is similar to that of Bhattacharjee and Bose (2014b), but under milder assumptions for the noise process \(\left( { \varepsilon }_{t}\right) \) and for the admissible class matrices \(\varvec{ \Gamma }_{k}\).

Comparing the results of estimating the covariance matrix is difficult because those results are for different assumptions on the class of covariance matrices and for independent or dependent data. For independent data Cai et al. (2010) obtained the optimal rate for tapering estimators of \(\varvec{\Gamma _0}\). In contrast such results do not exist for dependent data and the problem of finding the optimal rate of convergence in Theorem 1 is still open.

3 Appendix

Lemma 2

(Bhattacharjee and Bose 2014b). For any matrices \({\mathbf {A}}\) , \({\mathbf {B}}\) and for all \(\alpha \), \(\beta \), \(t>0\),

  1. (i)

    \(\left\| \mathbf {AB}\right\| _{(1,1)}\le \left\| \mathbf {A}\right\| _{(1,1)}\left\| \mathbf {B}\right\| _{(1,1)}\),

  2. (ii)

    \(T\left( \mathbf {AB,}\left( \alpha +\beta \right) t\right) \le \left\| \mathbf {A}\right\| _{(1,1)}T\left( \mathbf {B},\alpha t\right) +\left\| \mathbf {B}\right\| _{(1,1)}T\left( \mathbf {A},\beta t\right) \) .

Lemma 3

Suppose (A1)–(A2) and (A3) hold. Then, for all \(t>0\),

$$\begin{aligned} T\left( \varvec{\Gamma }_{k},t\right) \le (\tilde{C}_{1}\left\| \varvec{ \Sigma }\right\| _{(1,1)}+\tilde{C}_{2})t^{-\alpha } \end{aligned}$$

for some \(\alpha >0\), where \({\tilde{C}}_{1}\), \({\tilde{C}}_{2}\) are some constants.

Proof

Observe

$$\begin{aligned} \begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right) =T(\sum _{j=k}^{\infty }\varvec{\Psi }_{j}\varvec{\Sigma \Psi }_{j-k}^{^{\prime }},t)\le \sum _{j=k}^{\infty }T( \varvec{\Psi }_{j}\varvec{\Sigma \Psi }_{j-k}^{^{\prime }},t)\text{. } \end{aligned} \end{aligned}$$

From Lemma 2(ii), we have

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right) \le \sum _{j=k}^{\infty }\Vert \varvec{\Psi }_{j}\Vert _{(1,1)}T(\varvec{\Sigma \Psi } _{j-k}^{^{\prime }},t/2)+\sum _{j=k}^{\infty }\Vert \varvec{ \Sigma \Psi }_{j-k}^{^{\prime }}\Vert _{(1,1)}T(\varvec{\Psi }_{j}, t/2). \end{aligned}$$

It follows from Lemma 2(i) that

$$\begin{aligned} \begin{aligned} \Vert \varvec{\Sigma \Psi } _{j-k}^{^{\prime }}\Vert _{(1,1)}\le \Vert \varvec{\Sigma }\Vert _{(1,1)}\Vert \varvec{\Psi } _{j-k}^{^{\prime }}\Vert _{(1,1)}\text{. } \end{aligned} \end{aligned}$$

Hence

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right) \le \sum _{j=k}^{\infty }r_{j}T( \varvec{\Sigma \Psi }_{j-k}^{^{\prime }},t/2)+\sum _{j=k}^{\infty }r_{j-k}\left\| \varvec{\Sigma }\right\| _{(1,1)}T(\varvec{\Psi }_{j}, t/2)\text {.} \end{aligned}$$

Again using Lemma 2(ii), we obtain

$$\begin{aligned} \begin{aligned} T(\varvec{\Sigma \Psi } _{j-k}^{^{\prime }},t/2)\le \Vert {\varvec{\Sigma }\Vert }_{{(1,1)}}T(\varvec{\Psi }_{j-k}^{^{\prime }}, t/4)+\Vert \varvec{\Psi } _{j-k}^{^{\prime }}\Vert _{(1,1)}T(\varvec{\Sigma }, t/4) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right)\le & {} \sum _{j=k}^{\infty }r_{j}(\left\| \varvec{\Sigma }\right\| _{(1,1)}T(\varvec{\Psi } _{j-k}^{^{\prime }},t/4)+r_{j-k}T(\varvec{\Sigma }, t/4)) \\&+\sum _{j=k}^{\infty }r_{j-k}\left\| \varvec{\Sigma }\right\| _{(1,1)}T(\varvec{\Psi }_{j},t/2)\text {.} \end{aligned}$$

By assumptions (A1)-(A2), we see that

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right)\le & {} \sum _{j=k}^{\infty }r_{j}(\left\| \varvec{\Sigma }\right\| _{(1,1)}4^{\alpha }C_{1}d_{j-k}t^{-\alpha }+r_{j-k}4^{\alpha }C_{2}t^{-\alpha }) \nonumber \\&+\sum _{j=k}^{\infty }r_{j-k}\left\| \varvec{\Sigma }\right\| _{(1,1)}2^{\alpha }C_{1}d_{j}t^{-\alpha } \nonumber \\\le & {} \left\| \varvec{\Sigma }\right\| _{(1,1)}4^{\alpha }C_{1}t^{-\alpha }\sum _{j=k}^{\infty }r_{j}d_{j-k}+4^{\alpha }C_{2}t^{-\alpha }\sum _{j=k}^{\infty }r_{j}r_{j-k} \nonumber \\&+\left\| \varvec{\Sigma }\right\| _{(1,1)}2^{\alpha }C_{1}t^{-\alpha }\sum _{j=k}^{\infty }r_{j-k}d_{j}\text {.} \end{aligned}$$
(20)

From the Schwarz inequality, (A1) and (A3), we have

$$\begin{aligned} \sum _{j=k}^{\infty }r_{j}d_{j-k}\le \sqrt{\sum _{j=k}^{\infty }r_{j}^{2}} \sqrt{\sum _{j=k}^{\infty }d_{j-k}^{2}}<\infty \end{aligned}$$

and similarly \(\sum _{j=k}^{\infty }r_{j-k}d_{j}<\infty \), \( \sum _{j=k}^{\infty }r_{j}r_{j-k}<\infty \). Therefore, from (20), we get

$$\begin{aligned} T\left( \varvec{\Gamma }_{k},t\right) \le (\tilde{C}_{1}\left\| \varvec{ \Sigma }\right\| _{(1,1)}+\tilde{C}_{2})t^{-\alpha } \end{aligned}$$
(21)

for some constants \({\tilde{C}}_{1}>0\) and \({\tilde{C}}_{2}>0\). \(\square \)

Lemma 4

Suppose (SGauss) and (A3) hold. Then for any \(\eta >0\),

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( - \frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{i}^{2}r_{j}^{2}}\Big ) \text { .} \end{aligned}$$
(22)

Suppose (NGa\(_{\beta }\)) and (A4) hold. Then for any \(\eta >0\),

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le C_{\beta }C^{*}\frac{p^{2/\beta }n^{1/2}}{( n-k) \eta }\text {,} \end{aligned}$$
(23)

where \(C^{*}:=\left( \sum _{i=0}^{\infty }r_{i}\right) ^{2}\) and \( C_{\beta }\) is some constant depending on \(\beta \).

Proof

By a simple calculation,

$$\begin{aligned}&\big \Vert \varvec{\hat{\Gamma }}_{k}-\varvec{\Gamma }_{k} \big \Vert _{\infty }=\Big \Vert \frac{1}{n-k}\sum _{m=k+1}^{n}\mathbf {X}_{m}\mathbf {X} _{m-k}^{^{\prime }}-E(\mathbf {X}_{m}\mathbf {X}_{m-k}^{^{\prime }})\Big \Vert _{\infty } \\&\quad =\Big \Vert \frac{1}{n-k}\sum _{m=k+1}^{n}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\varvec{\Psi }_{j}{\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }}\varvec{\Psi }_{i}^{^{\prime }} -\\&\qquad \frac{1}{ n-k}\sum _{m=k+1}^{n}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\varvec{\Psi } _{j}E({\varepsilon }_{m-j}{\varepsilon } _{m-i-k}^{^{\prime }})\varvec{\Psi }_{i}^{^{\prime }}\Big \Vert _{\infty } \\&\quad =\Big \Vert \frac{1}{n-k}\sum _{m=k+1}^{n}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\varvec{\Psi }_{j}({\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j} {\varepsilon }_{m-i-k}^{^{\prime }}))\varvec{\Psi }_{i}^{^{\prime }}\Big \Vert _{\infty } \\&\quad \le \frac{1}{n-k}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\Big \Vert \varvec{ \Psi }_{j}\varvec{\Psi }_{i}^{^{\prime }}\Big \Vert _{(1,1)}\Big \Vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j}{\varepsilon } _{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }})\Big \Vert _{\infty } \\&\quad \le \frac{1}{n-k}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }r_{j}r_{i}\Big \Vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j} {\varepsilon }_{m-i-k}^{^{\prime }})\Big \Vert _{\infty }\text {.} \end{aligned}$$

Hence

$$\begin{aligned}&P\Big ( \Big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\Big \Vert _{\infty }>\eta \Big ) \\&\quad \le P\Big ( \sum _{j=0}^{\infty }\sum _{i=0}^{\infty }r_{j}r_{i}\Big \Vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j}{\varepsilon } _{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }})\Big \Vert _{\infty }>(n-k)\eta \Big ) \\&\quad =P\Big ( \sum _{j=0}^{\infty }\sum _{i=0}^{\infty }r_{j}r_{i}\max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j,l}{ \varepsilon }_{m-i-k,s}-E\Big ( {\varepsilon }_{m-j,l} {\varepsilon }_{m-i,s}\Big ) \Big \vert \\ {}&\quad >(n-k)\eta \Big ) \text {.} \end{aligned}$$

Put \(\zeta _{m,i,j}^{l,s}:={\varepsilon }_{m-j,l}{\varepsilon } _{m-i-k,s}\). Then

$$\begin{aligned} P\big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \big ) \le \sum _{j,i=0}^{\infty }P\big ( \max _{1\le l,s\le p}\big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E\big ( \zeta _{m,i,j}^{l,s}\big ) \big \vert > \frac{(n-k)\eta }{r_{j}r_{i}}\big ) \text {.} \nonumber \\ \end{aligned}$$
(24)

From (SGauss) and (A3) we conclude that

$$\begin{aligned}&P\Big ( \max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E\Big ( \zeta _{m,i,j}^{l,s}\Big ) \Big \vert> \frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \\&\quad \le \sum _{s=1}^{p}P\Big ( \max _{1\le l\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E\Big ( \zeta _{m,i,j}^{l,s}\Big ) \Big \vert >\frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \text {.} \end{aligned}$$

Since for fixed s, \((\zeta _{m,i,j}^{l,s})\) is sub-Gaussian with constant \( \sigma ^{2}\), we have

$$\begin{aligned} E\exp (u\zeta _{m,i,j}^{l,s})\le \exp (u^{2}\sigma ^{2}/2) \text {.} \end{aligned}$$

Hence, from the maximal inequality for sub-Gaussian r.v.’s we obtain

$$\begin{aligned} P\Big ( \max _{1\le l\le p}\left| \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\right| >\frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \le 2p\exp \Big ( -\frac{\left( n-k\right) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) \text {.} \end{aligned}$$

Hence, we have

$$\begin{aligned} P\Big ( \Big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\Big \Vert _{\infty }>\eta \Big ) \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( - \frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) \text { .} \end{aligned}$$

Suppose (NGa\(_{\beta }\)) and (A4) hold. Applying to the r.h.s. of (24) Pisier’s maximal inequality (Pisier 1983)

$$\begin{aligned} E\max _{1\le i\le N}\left| Z_{i}\right| \le N^{1/Q}\max _{1\le i\le N}\left\| Z_{i}\right\| _{Q}\text {,} \end{aligned}$$

which holds for any random variables \(\left( Z_{i}\right) \) with \(\left\| Z_{i}\right\| _{Q}=E^{1/Q}\left| Z_{i}\right| ^{Q}<\infty \) for \(Q>1\), and putting \(Q=\beta \), we obtain

$$\begin{aligned}&P\Big ( \max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \vert >\frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \\&\quad \le \frac{r_{j}r_{i}}{(n-k)\eta }E\Big ( \max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \vert \Big ) \\&\quad \le \frac{p^{2/\beta }r_{j}r_{i}}{(n-k)\eta }\max _{1\le l,s\le p}\Big \Vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \Vert _{\beta } \\&\quad \le \frac{p^{2/\beta }}{(n-k)\eta }r_{j}r_{i}\max _{1\le l,s\le p}\Big \Vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \Vert _{\beta }\text {.} \end{aligned}$$

For fixed ijls the sequence \((\zeta _{m,i,j}^{l,s})_{m}\) is i.i.d. and using the moment bound ( Petrov 1995) and (NGa\(_{\beta }\)), we obtain

$$\begin{aligned} \Big \Vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \Vert _{\beta }\le C_{\beta }n^{1/2}\text {.} \end{aligned}$$

Hence, we get (23), as desired. \(\square \)

Proof of Lemma 1

From Lemma 4 under (SGauss), it follows that

$$\begin{aligned} P\Big ( \Big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\Big \Vert _{\infty }>\eta \Big ) \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( - \frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) \text { .} \end{aligned}$$

Since for all \(x>0\), we have \(\exp ( -x) <x^{-\gamma }\) for some \( \gamma >1\), it follows that

$$\begin{aligned} \sum _{i,j=0}^{\infty }\exp \Big ( -\frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) <\frac{( 2\sigma ^{2}) ^{\gamma }}{( n-k) ^{2\gamma }\eta ^{2\gamma }}\Big ( \sum _{j=0}^{\infty }r_{j}^{2\gamma }\Big ) ^{2}\text {.} \end{aligned}$$

Putting \(\eta =\sqrt{n^{-1} \log p}\), we obtain

$$\begin{aligned} p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( -\frac{( n-k) ^{2}\eta ^{2}}{ 2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big )= & {} {\mathcal {O}}\Big ( \frac{p^{2}}{( n-k) ^{2\gamma }\eta ^{2\gamma }}\Big ) \\= & {} {\mathcal {O}}\Big ( \frac{p^{2}}{n^{\gamma }\log ^{\gamma }p}\Big ) ={\mathcal {O}}\Big ( \frac{1 }{\log ^{\gamma }p}\Big ) =o(1) \end{aligned}$$

as \(n\rightarrow \infty \). Therefore, (6) holds for \(c_{n}=\sqrt{ n^{-1} \log p}\). Under (NGa\(_{\beta }\)) and from Lemma 4, we have

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le C_{\beta }C^{*}\frac{p^{2/\beta }n^{1/2}}{( n-k) \eta }\text {.} \end{aligned}$$

Putting \(\eta =p^{2/\beta }\sqrt{n^{-1} \log p}\), we obtain

$$\begin{aligned} \frac{p^{2/\beta }n^{1/2}}{( n-k) \eta }\sim \frac{p^{2/\beta }}{ n^{1/2}\eta }=\frac{1}{\sqrt{\log p}}\rightarrow 0 \end{aligned}$$

as \(n\rightarrow \infty \). \(\square \)

Remark C

From Lemma A.3 (Bickel and Levina 2008b) under the assumption that \(({\varepsilon }_{t})\) is Gaussian and (A5), we may deduce that for some \(\delta >0\) and any \(\left| \eta \right| \le \delta \),

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le C_{1}^{*}\exp (-C_{2}^{*}( n-k) \eta ^{2}) \end{aligned}$$

for some constants \(C_{1}^{*}\), \(C_{2}^{*}>0\). Reasoning as in the proof of Lemma 1, we may obtain (6) for \(c_{n}=\sqrt{ n^{-1} \log p}\).

Proof of Theorem 1

From the inequality in (Bhattacharjee and Bose (2014b), p. 280), we find that

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}\le (2l_{n}+1)\big \Vert \varvec{\hat{\Gamma }}_{k}- \varvec{\Gamma }_{k}\big \Vert _{\infty }+T\big ( \varvec{\Gamma } _{k},l_{n}\big ) . \end{aligned}$$
(25)

From Lemma 3, we have

$$\begin{aligned} \begin{aligned} T\big ( \mathbf {\Gamma }_{k},l_{n}\big ) =\mathcal {O}(\big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}l_{n}^{-\alpha }) \end{aligned} \end{aligned}$$
(26)

for any \(l_{n}\rightarrow \infty \) as \(n\rightarrow \infty \). From Lemma 1 (see also Remark C when (Gauss) holds), we get

$$\begin{aligned} \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }}_{k}\big \Vert _{\infty }={\mathcal {O}}_{P}\big ( c_{n}\big ) \end{aligned}$$
(27)

for \(c_{n}\) as in Lemma 1. Consequently, due to (25)-(27), we have

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}\big ( l_{n}c_{n}+\big \Vert \varvec{ \Sigma }\big \Vert _{(1,1)}l_{n}^{-\alpha }\big ) \text {.} \end{aligned}$$

Putting \(l_{n}=c_{n}^{-\frac{1}{\alpha +1}}\), we obtain

\(\big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}\big ( \big \Vert \varvec{\Sigma } \big \Vert _{(1,1)}l_{n}^{-\alpha }\big ) =\mathcal {O}_{P}\big ( \big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}\big ( c_{n}\big ) ^{\frac{\alpha }{ \alpha +1}}\big ) \) and (4) holds.

Proof of Corollary 1

Reasoning as in (Bhattacharjee and Bose (2014b), Section 6.2), we have: if \({\varvec{A}}^{-1}\) exists and \(\big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert \le \big \Vert {\varvec{A}}^{-1}\big \Vert ^{-1}\), then \(\big \Vert {\varvec{B}}^{-1}-{\varvec{A}}^{-1}\big \Vert \le \frac{\big \Vert {\varvec{A}}^{-1}\big \Vert ^{2}\big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert }{ 1-\big \Vert {\varvec{A}}^{-1}\big \Vert \big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert }\). Put \({\varvec{A}}=\varvec{ \Gamma }_{k}\) and \({\varvec{B}}=\mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})\). Since \( \big \Vert \varvec{\Gamma }_{k}^{-1}\big \Vert _{2}=\mathcal {O}\big ( 1\big ) \) and from Theorem 1, \(\big \Vert \mathbf {B}_{l_{n}}( \varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}\big \Vert _{2}=\mathcal {O}_{ P}\big ( a_{n}\big ) \) as \(a_{n}\rightarrow 0\), it follows that for large n we have \(\big \Vert {\varvec{A}}^{-1}\big \Vert \big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert \le 1\). Therefore for some \(C>0\) and large n, we find that

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}^{-1}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}^{-1}\big \Vert _{2}\le C\big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{ \Gamma }}_{k})-\varvec{\Gamma }_{k}\big \Vert _{2}\text {.} \end{aligned}$$

Hence, directly from (4), we have (5).

Proof of Proposition 1

From (25), we have for any \(\eta >0\),

$$\begin{aligned} P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{ \Gamma }_{k}\big \Vert _{2}>\eta \Big ) \le P\Big ( (2l_{n}+1)\big \Vert \varvec{\hat{\Gamma }}_{k}-\varvec{\Gamma }_{k}\big \Vert _{\infty }+T\Big ( \varvec{\Gamma }_{k},l_{n}\big ) >\eta \Big ) . \end{aligned}$$

Using (21), we obtain

$$\begin{aligned} P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{ \Gamma }_{k}\big \Vert _{2}>\eta \Big ) \le P\Big ( \big \Vert \varvec{\hat{ \Gamma }}_{k}-\varvec{\Gamma }_{k}\big \Vert _{\infty }>\frac{\eta -(\tilde{C }_{1}\big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}+\tilde{C} _{2})l_{n}^{-\alpha }}{2l_{n}+1}\Big ) \text {.} \end{aligned}$$

Therefore, from Lemma 4, we deduce (7)-(8).

Proof of Proposition 2

First, we will show (11) by induction. For \(j=1\), under the condition of class \(\mathcal {A}\), \(\left\| \varvec{\Psi }_{j}\right\| _{(1,1)}=\left\| \varvec{\Psi }_{1}\right\| _{(1,1)}=\left\| \mathbf {A}_{1}\right\| _{(1,1)}\le C_{1}\delta \) and (11) holds for \(j=1\). From (10), \(C_{1}r<1\) and the induction assumption that \( \left\| \varvec{\Psi }_{i}\right\| _{(1,1)}\le C_{1}\delta ^{i}\) for all \(i\le j\), we have

$$\begin{aligned} \Vert \varvec{\Psi }_{j+1}\Vert _{(1,1)}\le & {} \sum _{i=1}^{\min (j+1,r)}\Vert \mathbf {A}_{i}\varvec{\Psi }_{j+1-i}\Vert _{(1,1)}\le \sum _{i=1}^{\min (j+1,r)}\Vert \varvec{A}_{i}\Vert _{(1,1)}\Vert \varvec{\Psi }_{j+1-i}\Vert _{(1,1)} \\\le & {} \sum _{i=1}^{\min (j+1,r)}C_{1}^2\delta ^{i}\delta ^{j+1-i}=C_{1}^{2}\sum _{i=1}^{\min (j+1,r)}\delta ^{j+1}\le C_{1}^{2}r\delta ^{j+1}\le C_{1}\delta ^{j+1}\text {.} \end{aligned}$$

In a similar manner, we have \(\big \Vert \varvec{\Psi }_{j}^{^{\prime }}\big \Vert _{(1,1)}\le C_{1}\delta ^{j}\). Hence, (11) is proved. Next, by induction we will show (12). For \(j=1\), \(T\left( \varvec{ \Psi }_{j},t\right) =T\left( \varvec{\Psi }_{1},t\right) =T\left( \mathbf {A} _{1},t\right) \le C_{2}\delta t^{-\alpha }\) under the condition of class \( \mathcal {A}\). Hence, (12) is satisfied for \(j=1\). By (10) and Lemma 2(ii),

$$\begin{aligned} T( \varvec{\Psi }_{j+1},t)\le & {} \sum _{i=1}^{\min (j+1,r)}T( \mathbf {A}_{i}\varvec{\Psi }_{j+1-i},t) \\\le & {} \sum _{i=1}^{\min (j+1,r)}\{\Vert \mathbf {A}_{i}\Vert _{(1,1)}T(\varvec{\Psi }_{j+1-i},t/2)+\Vert \varvec{\Psi } _{j+1-i}\Vert _{(1,1)}T(\mathbf {A}_{i},t/2)\}\text {.} \end{aligned}$$

Then, by the induction assumption that \(T\left( \varvec{\Psi }_{i},t\right) \le C_{2}i\delta ^{i}t^{-\alpha }\) for all \(i\le j\) and under the conditions of class \(\mathcal {A}\) (\(C_{1}=1/\left( 2^{\alpha }r\right) \)) it follows that

$$\begin{aligned} T(\varvec{\Psi }_{j+1},t)\le & {} \sum _{i=1}^{\min (j+1,r)}\big \{ C_{1}\delta ^{i}C_{2}(j+1-i)\delta ^{j+1-i}2^{\alpha }t^{-\alpha }+C_{1}\delta ^{j+1-i}C_{2}\delta ^{i}2^{\alpha }t^{-\alpha }\big \} \\\le & {} \sum _{i=1}^{\min (j+1,r)}\big \{ C_{1}C_{2}j\delta ^{j+1}2^{\alpha }t^{-\alpha }+C_{1}C_{2}\delta ^{j+1}2^{\alpha }t^{-\alpha }\big \} \\\le & {} rC_{1}C_{2}2^{\alpha }\delta ^{j+1}t^{-\alpha }(j+1)=C_{2}(j+1)\delta ^{j+1}t^{-\alpha }\text {.} \end{aligned}$$

By a similar reasoning, we obtain \(T(\varvec{\Psi }_{j}^{^{\prime }},t)\le C_{2}j\delta ^{j}t^{-\alpha }\). Hence, by induction the proof of (12) is complete.

Lemma 5

For any matrix \(\mathbf {A}\),

$$\begin{aligned} T\left( \mathbf {A}^{j},t\right) \le j\left\| \mathbf {A}\right\| _{(1,1)}^{j-1}T(\mathbf {A},t/2^{j-1}) \end{aligned}$$
(28)

for all \(t>0\) and \(j=1,2,...\)

Proof

Clearly for \(j=1\) inequality (28) holds. Now, we assume that (28) is true for some j. From the induction assumption and Lemma 2 (ii),

$$\begin{aligned} \begin{aligned} T\left( \mathbf {A}^{j+1},t\right) \le&{} \left\| \mathbf {A} ^{j}\right\| _{(1,1)}T(\mathbf {A},t/2)+\left\| \mathbf {A} \right\| _{(1,1)}T(\mathbf {A}^{j},t/2) \\\le&{} \left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t/2 )+j\left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t/2^{j}) \\\le&{} \left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t /2^{j})+j\left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t /2^{j}) \\\le&{} (j+1)\left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t /2^{j})\text{. } \end{aligned} \end{aligned}$$

By induction, we see that (28) holds for all positive integers j.