Skip to main content
Log in

Variable selection in high-dimensional sparse multiresponse linear regression models

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

We consider variable selection in high-dimensional sparse multiresponse linear regression models, in which a q-dimensional response vector has a linear relationship with a p-dimensional covariate vector through a sparse coefficient matrix \(B\in R^{p\times q}\). We propose a consistent procedure for the purpose of identifying the nonzeros in B. The procedure consists of two major steps, where the first step focuses on the detection of all the nonzero rows in B, the latter aims to further discover its individual nonzero cells. The first step is an extension of Orthogonal Matching Pursuit (OMP) and the second step adopts the bootstrap strategy. The theoretical property of our proposed procedure is established. Extensive numerical studies are presented to compare its performances with available representatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Buehlmann P (2006) Boosting for high-dimensional linear models. Ann. Stat. 34(2):559–583

    Article  MathSciNet  Google Scholar 

  • Cai TT, Li H, Liu W, Xie J (2013) Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika 100(1):139–156

    Article  MathSciNet  MATH  Google Scholar 

  • Chun H, Keleş S (2009) Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 182(1):79–90

    Article  Google Scholar 

  • Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. 72(1):3–25

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann. Stat. 32(2):407–499

    Article  MathSciNet  MATH  Google Scholar 

  • Ing C, Lai T (2011) A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Stat. Sin. 21(4):1473

    Article  MathSciNet  MATH  Google Scholar 

  • Jia Z, Xu S (2007) Mapping quantitative trait loci for expression abundance. Genetics 176(1):611–623

    Article  Google Scholar 

  • Johnsson T (1992) A procedure for stepwise regression analysis. Stat. Pap. 33(1):21–29

    Article  Google Scholar 

  • Liu, J., Ma, S., Huang, J.: Penalized methods for multiple outcome data in genome-wide association studies. Technical report (2012)

  • Luo S, Chen Z (2013) Extended bic for linear regression models with diverging number of relevant features and high or ultra-high feature spaces. J. Stat. Plan. Infer. 143(3):494–504

    Article  MathSciNet  MATH  Google Scholar 

  • Luo S, Chen Z (2014) Sequential lasso cum ebic for feature selection with ultra-high dimensional feature space. J. Am. Stat. Assoc. 109(507):1229–1240

    Article  MathSciNet  MATH  Google Scholar 

  • Lutoborski A, Temlyakov V (2003) Vector greedy algorithms. J. Complex. 19(4):458–473

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S, Huang J, Song X (2011) Integrative analysis and variable selection with multiple high-dimensional data sets. Biostatistics 12(4):763–775

    Article  MATH  Google Scholar 

  • Mammen E (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann. Stat. 21(1):255–285

    Article  MathSciNet  MATH  Google Scholar 

  • Obozinski G, Wainwright MJ, Jordan MI (2011) Support union recovery in high-dimensional multivariate regression. Ann. Stat. 39(1):1–47

    Article  MathSciNet  MATH  Google Scholar 

  • Özkale MR (2015) Predictive performance of linear regression models. Stat. Pap. 56(2):531–567

    Article  MathSciNet  MATH  Google Scholar 

  • Peng J, Zhu J, Bergamaschi A, Han W, Noh D-Y, Pollack JR, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4(1):53–77

    Article  MathSciNet  MATH  Google Scholar 

  • Rothe G (1986) Some remarks on bootstrap techniques for constructing confidence intervals. Stat. Pap. 27(1):165–172

    MathSciNet  Google Scholar 

  • Similä, T., Tikka, J.: Common subset selection of inputs in multiresponse regression. In: Neural Networks, 2006. IJCNN’06. International Joint Conference on. IEEE, pp. 1908–1915 (2006)

  • Similä T, Tikka J (2007) Input selection and shrinkage in multiresponse linear regression. Comput. Stat. Data Anal. 52(1):406–422

    Article  MathSciNet  MATH  Google Scholar 

  • Temlyakov VN (2000) Weak greedy algorithms. Adv. Comput. Math. 12(2):213–227

    Article  MathSciNet  MATH  Google Scholar 

  • Tropp J, Gilbert A, Strauss M (2006) Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process. 86(3):572–588

    Article  MATH  Google Scholar 

  • Turlach B, Venables W, Wright S (2005) Simultaneous variable selection. Technometrics 47(3):349–363

    Article  MathSciNet  Google Scholar 

  • Wang, J.: Joint estimation of sparse multivariate regression and conditional graphical models. Stat. Sin. pp. 831–851 (2015)

  • Yang, C., Wang, L., Zhang, S., Zhao, H.: Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping. Bioinformatics (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shan Luo.

Additional information

This research was supported by National Natural Science Foundation of China (NSFC): 11401378 and Shanghai Jiao Tong University start-up fund for special researchers: WF220407103.

Appendix: Technical proofs

Appendix: Technical proofs

In this section, we provide technical proofs of our main theorems. For the convenience of the reader, we restate some useful notations and conditions.

For given K, if there exists a \(k\le K\) such that \({S}_{0}\subseteq s_k\), define

$$\begin{aligned} \tilde{K}=\min \left\{ k:1\le k\le K, {S}_{0}\subseteq s_k\right\} , \end{aligned}$$
(7)

otherwise, \(\tilde{K}\) is defined to be K.

We assume the following conditions:

(C1):

\(\ln p=o(n^{1/3})\);

(C2):

The predictor vector \({x}\) and the error vector \({e}\) satisfy

(C2.1):

the covariates in \({x}\) have a constant variance 1 and correlations bounded from 0 and 1.

(C2.2):

\(\sigma _{\max }\equiv \max _{1 \le j,k\le p} \sigma ({x}_j{x}_k) < \infty \) where \(\sigma ({x}_j{x}_k)\) denotes the standard deviation of \({x}_j{x}_k\).

(C2.3):

\(\max _{1\le j,k\le p}{E}\exp (t{x}_j{x}_k) \) and \(\max _{1\le i\le q,1\le j\le p}{E}\exp (t{x}_j{e}^i) \) are finite for t in a neighborhood of zero.

(C3):

\(\sum _{1\le i\le p}\sum _{1\le j\le q}|{\beta }_{ij}|\le c\) where c is a positive constant.

(C4):

There exists a constant \(\delta >0\) such that

$$\begin{aligned} \min _{1\le |A|\le K}\lambda _{\min }(\varGamma _{A,A})\ge \delta \end{aligned}$$
(8)

for K satisfying \(K=O\left( q^{-1}\sqrt{n/\ln p}\right) .\)

(C5):

For the K in C4, there exists a \(0<\kappa <1\) satisfying \(n^{-\kappa }K\rightarrow +\,\infty , n^{1-2\kappa }/\ln p\rightarrow +\,\infty \) and

$$\begin{aligned} \lim \inf _{n\rightarrow +\infty }n^{\kappa }\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2^2>0. \end{aligned}$$

Denote \(\tilde{{y}}_{A,i}={x}_{A}\varGamma _{A,A}^{-1}\text{ E }(x_{A}^{\top }y_i),\;\; \hat{{y}}_{A,i}={x}_{A}\hat{\varGamma }_{A,A}^{-1}X_{A}^{\top }Y_i\) and \(\tilde{{y}}_{k,i}=\tilde{{y}}_{s_k,i}, \hat{{y}}_{k,i}=\hat{{y}}_{s_k,i}\).

Theorem 1

Under assumptions C1–C5, we have

$$\begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\hat{{y}}_{k,i})^2|(Y,X)\right) =O_p\left( \dfrac{1}{m}+q\sqrt{\dfrac{\ln p}{n}}+\dfrac{m\ln p}{n}\right) . \end{aligned}$$

When \(m=O(q^{-1}\sqrt{n/\ln p})\), the right hand side is \(O_p(m^{-1}).\)

Proof of Theorem 1

Denote

$$\begin{aligned} \text{ I }= \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{k,i})^2|(Y,X)\right) ,\;\; \text{ II }= \sum \limits _{1\le i\le q}{E}\left( (\hat{{y}}_{k,i}-\tilde{{y}}_{k,i})^2|(Y,X)\right) . \end{aligned}$$

Firstly, we focus on \(\text{ I }\). For \(A\subseteq \{1,2,\ldots ,p\},1\le i\le q, 1\le j\le p,\) denote by \(E_i\) the ith column of the error matrix E and

$$\begin{aligned} {\mu }_{A,j,i}={E}\left( {x}_j(y_i-\tilde{{y}}_{A,i})|A\right) ,\;\;\hat{{\mu }}_{A,j,i}=\left( \dfrac{1}{n}X_j^{\top }X_j\right) ^{-1/2}\left( \dfrac{1}{n}X_j^{\top }\left( {I}-{H}_0(A)\right) y_i\right) ,\nonumber \\ \end{aligned}$$
(9)

for simplicity, denote them by \({\mu }_{k,j,i}, \hat{{\mu }}_{k,j,i}\) when \(A=s_k\). Since \({E}\left( {x}_{s_k}^{\top }\left( y_i-\tilde{{y}}_{k,i}\right) |(Y,X)\right) =0\), therefore, for any k,

$$\begin{aligned} \begin{aligned} \text{ I }=&\sum \limits _{1\le i\le q}{E}\left( \sum _{j\notin s_k}\left( y_i-\tilde{{y}}_{k,i}\right) {x}_j{\beta }_{ji}|(Y,X)\right) = \sum _{j\notin s_k}\left( \sum \limits _{1\le i\le q}{\mu }_{k,j,i}{\beta }_{ji}\right) \\ \le&\left[ \max \limits _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k,j,i}\right) ^2\right) ^{1/2}\right] \times \left( \sum _{j\notin s_k}\left( \sum \limits _{1\le i\le q}\left( {\beta }_{ji}\right) ^2\right) ^{1/2}\right) . \end{aligned} \end{aligned}$$
(10)

Now it suffices to estimate the first term of the right hand side in the last inequality. Note that by definition of \(\hat{j}_k\),

$$\begin{aligned} \sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{k-1,j_k,i}\right) ^2\ge \max _{1\le j\le p}\sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{k-1,j,i}\right) ^2. \end{aligned}$$

Hence, triangle inequality implies

$$\begin{aligned}&\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j_k,i}\right) ^2\right) ^{1/2}\ge -\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}^i_{k-1,j_k,i}-{\mu }_{k-1,j_k,i}\right) ^2\right) ^{1/2}\nonumber \\&\qquad +\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}^i_{k-1,j_k,i}\right) ^2\right) ^{1/2}\nonumber \\&\quad \ge -\sqrt{q}\max _{|A|\le k-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|+\max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{k-1,j,i}\right) ^2\right) ^{1/2}\nonumber \\&\quad \ge -\sqrt{q}\max _{|A|\le k-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j}^i-{\mu }_{A,j}^i|\nonumber \\&\qquad -\max _{|A|\le k-1,j\notin A}\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}\right) ^2\right) ^{1/2} +\max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j,i}\right) ^2\right) ^{1/2}\nonumber \\&\quad \ge -2\sqrt{q}\max _{|A|\le k-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|+\max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j,i}\right) ^2\right) ^{1/2},\nonumber \\ \end{aligned}$$
(11)

on the other hand,

$$\begin{aligned} \left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j_k,i}\right) ^2\right) ^{1/2}\le \max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j,i}\right) ^2\right) ^{1/2}. \end{aligned}$$

For any \(0<\xi <1\) and \(C>0\), let \(\tilde{\xi }=2/(1-\xi )\) and define

$$\begin{aligned} A_n(m)= & {} \left\{ \max _{|A|\le m-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|\le C\sqrt{\dfrac{\ln p}{n}}\right\} ,\\ B_n(m)= & {} \left\{ \min _{0\le k\le m-1}\max _{1\le j\le p}\left( \sum _{1\le i\le q}\left( \mu ^{i}_{s_k,j}\right) ^2\right) ^{1/2}>\tilde{\xi }C\sqrt{\dfrac{q\ln p}{n}}\right\} , \end{aligned}$$

then

$$\begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{k,i})^2I(A_n(m)\cap B_n^c(m))|(Y,X)\right)= & {} O(q\sqrt{\ln p/n}),\\ \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{k,i})^2I(A_n(m)\cap B_n(m))|(Y,X)\right)= & {} O(m^{-1}), \end{aligned}$$

the second inequality is implied by Theorem 3 in Temlyakov (2000). By direct computation, we have

$$\begin{aligned} \hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i} =\dfrac{n^{-1}X_j^{\top }\left( {I}-{H}_0(A)\right) {E}_i}{(n^{-1}X_j^{\top }X_j)^{1/2}}+\sum \limits _{r\notin A}{\beta }_{ri}\left\{ \dfrac{\hat{\varGamma }_{jr|A}}{(n^{-1}X_i^{\top }X_i)^{1/2}}-\varGamma _{jr|A}\right\} . \end{aligned}$$

From Lemma 1 in Luo and Chen (2014), under C1 to C5, when \(\max (\ln m,\ln q)=O(\ln p)\), we have

$$\begin{aligned} \max _{1\le j\le p}|n^{-1}X_j^{\top }X_j-1|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) ,\max _{|A|\le m,j,r\notin A}|\hat{\varGamma }_{jr|A}- \varGamma _{jr|A}|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) , \end{aligned}$$

and also,

$$\begin{aligned} \max _{|A|\le m,j,r\notin A}|n^{-1}X_j^{\top }\left( {I}-{H}_0(A)\right) {E}_i|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) . \end{aligned}$$

Consequently,

$$\begin{aligned} \max _{|A|\le m-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) . \end{aligned}$$
(12)

Now we focus on \(\text{ II }\). Note that

$$\begin{aligned} \hat{{y}}_{k,i}-\tilde{{y}}_{k,i}={x}_A\left( \hat{\varGamma }_{s_k,s_k}^{-1}X_{s_k}^{\top }Y_i-\varGamma _{s_k,s_k}^{-1}\text{ E }({x}_{s_k}y_i)\right) \end{aligned}$$

from the above discussions, we can see that all components in \(\hat{\varGamma }_{s_k,s_k}^{-1}X_{s_k}^{\top }Y_i-\varGamma _{s_k,s_k}^{-1}\text{ E }({x}_{s_k}y_i)\) is uniformly \(O_p(\sqrt{\ln p/n})\). Therefore, \(\text{ II }=O_p\left( m \ln p/n\right) .\) The desired result is obtained. \(\square \)

Theorem 2

Under assumptions C1-C5, the MOMP posses sure screening property, that is,

$$\begin{aligned} P\left( {S}_{0}\subset s_{K}\right) \rightarrow 1\;\text {as}\; n\rightarrow +\,\infty \end{aligned}$$

for K defined in C4.

Proof of Theorem 2

Let \(\tilde{{\beta }}_{ji}(A)\) be the coefficient of \({x}_A\) in the best linear predictor \(\tilde{{y}}_{A,i}\) as defined in (9) and \(\tilde{{\beta }}_{ji}(A)\) be 0 if \(j\notin A\). Note that

$$\begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{m,i})^2|(Y,X)\right) =\sum \limits _{1\le i\le q}\text{ E }\left( \left( \sum \limits _{1\le j\le p}(\beta _{ji}-\tilde{{\beta }}_{ji}(s_m)){x}_j\right) ^2|(Y,X)\right) . \end{aligned}$$

The inequality

$$\begin{aligned} \sum \limits _{1\le i\le q}\sum \limits _{1\le j\le p}|{\beta }_{ji}|\ge |{S}_{0}|\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _1\ge |{S}_{0}|\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2 \end{aligned}$$
(13)

and C3, C5 implies that \(|{S}_{0}|=O(n^{\kappa /2})\), yielding \(|{S}_{0}\cup s_m| = O(m+n^{\kappa /2})\) and it follows from the above inequality that, if \(s_m^c\cap {S}_{0}\ne \emptyset \) and \(m=K\), then

$$\begin{aligned} \begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{m,i})^2|(Y,X)\right) \ge&\, {E}\sum \limits _{1\le i\le q}\sum \limits _{j\in s_m^c\cap {S}_{0}}\left( {\beta }_{ji}{x}_j\right) ^2\\ \ge&\sum \limits _{1\le i\le q}\sum \limits _{j\in s_m^c\cap {S}_{0}}\left( {\beta }_{ji}\right) ^2\lambda _{\min }\left( \varGamma _{s_m^c\cap S_{0},s_m^c\cap S_{0}}\right) \\ \ge \,&\delta \min \limits _{j\in S_{0}}\Vert \beta _j\Vert _2^2\ge Cn^{-\kappa } \end{aligned} \end{aligned}$$

for some positive constant C when n is sufficiently large. From C5, \(mn^{-\kappa }\rightarrow +\infty \), this contradicts with Theorem 2. Therefore, \(P\left( {S}_{0}\subseteq s_{K}\right) \rightarrow 1\) as \(n\rightarrow \infty .\)\(\square \)

Theorem 3

Under the assumptions C1–C5, when \({e}\) follows a multivariate normal distribution, we have

$$\begin{aligned} \lim _{n\rightarrow +\infty } P(\hat{K}=\tilde{K})=1\;\text {where}\;\tilde{K}=\min \{1\le k\le K: S_{0}\subset s_k\} \end{aligned}$$

if \(\gamma \) in (2) is larger than \(1-\ln n/2\ln p\).

Proof of Theorem 3

Theorem 2 implies that there exists a constant \(a>0\) such that

$$\begin{aligned} \lim _{n\rightarrow +\infty }P\left( \tilde{K}\le aq^{-1}\sqrt{n/\ln p}\right) =1 \end{aligned}$$
(14)

Suppose \(j\notin A,A_1\subsetneq A_2\), the following two identities

$$\begin{aligned} {I}- {H}_0(A\cup \{j\})= & {} [ {I}- {H}_0(A)]\left( {I}- \frac{ X_{j} X_j^{\top }[ {I}- {H}_0(A)] }{ X_j^{\top }[ {I}- {H}_0(A)]X_j} \right) ,\nonumber \\ {H}_0(A_2)-{H}_0(A_1)= & {} ({I}-{H}_0(A_1))X_{A_2\cap A_1^c}\{X^{\top }_{A_2\cap A_1^c}({I}-{H}_0(A_1))X_{A_2\cap A_1^c}\}^{-1}X^{\top }_{A_2\cap A_1^c}\nonumber \\&({I}-{H}_0(A_1)), \end{aligned}$$
(15)

are very important to prove this theorem.

Without loss of generality, we assume that all features and errors \(e_i\) have sample mean 0 and sample variance 1. Define

$$\begin{aligned} \begin{aligned} A_{k-1}=&n^{-1}X^{\top }_{j_{k}}\left( {I}-{H}_0(s_{k-1})\right) X_{j_k}\\ B_{k-1,i}=&n^{-1}X^{\top }_{j_{k}}\left( {I}-{H}_0(s_{k-1})\right) E_i\\ C_{k-1,i}=&n^{-1}E_i^{\top }\left( {I}-{H}_0(s_{k-1})\right) E_i-1, \end{aligned} \end{aligned}$$

From Lemma 1 in Luo and Chen (2014), it is straightforward to have the following conclusions,

(A):

On one hand, \(\max \limits _{1\le k\le n}|A_{k-1}-\varGamma _{j_k,j_k|s_{k-1}}|=o_p(1)\); on the other hand,

$$\begin{aligned} A_{k-1}\ge & {} \lambda _{\min }\left( \hat{\varGamma }_{s_k,s_k}\right) \left[ 1+\Vert \left( X^{\top }_{s_{k-1}}X_{s_{k-1}}\right) ^{-1}X^{\top }_{s_{k-1}}X_{j_{k}}\Vert _2^2\right] \\\ge & {} \lambda _{\min }\left( \hat{\varGamma }_{s_k,s_k}\right) \end{aligned}$$

When \(\lambda _{\min }\left( \varGamma _{s_k,s_k}\right) \ge \delta \), for any k-dimensional unit vector \({w}\),

$$\begin{aligned} \begin{aligned} \min \left( {w}^{\top }\hat{\varGamma }_{s_k,s_k}{w}\right) =&\min \left( {w}^{\top }\left\{ \left[ \hat{\varGamma }_{s_k,s_k}-\varGamma _{s_k,s_k}\right] +\varGamma _{s_k,s_k}\right\} {w}\right) \\ \ge&\min \left( {w}^{\top }\varGamma _{s_k,s_k}{w}\right) -\max \left( {w}^{\top }\left[ \hat{\varGamma }_{s_k,s_k}-\varGamma _{s_k,s_k}\right] {w}\right) \\ \ge&\lambda _{\min }\left( \varGamma _{s_k,s_k}\right) -\Vert {w}\Vert _1^2\max _{1\le i,j\le p}|n^{-1}X_i^{\top }X_j-\varGamma _{i,j}|\\ \ge&\delta -O_p\left( k\sqrt{\dfrac{\ln p}{n}}\right) . \end{aligned} \end{aligned}$$
(16)

That is, \(P(A_{k-1}\ge \delta )\rightarrow 1\) as \(n\rightarrow +\infty \) provided \(k\sqrt{\ln p/n}=o(1).\)

(B):

\(\max \limits _{k=O(q^{-1}\sqrt{n/\ln p}),1\le i\le q}|B_{k-1,i}|=O_p\left( \sqrt{\dfrac{\ln p+\ln q}{n}}\right) =o_p\left( \min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2\right) .\)

(C):

\(\max \limits _{k=O(q^{-1}\sqrt{n/\ln p}),1\le i\le q}|C_{k-1,i}|=O_p\left( \sqrt{\dfrac{\ln p+\ln q}{n}}\right) =o_p\left( \min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2\right) .\)

(i):

If \(\hat{K}<\tilde{K}\), \(\text{ EBIC }_{\gamma }(s_{\hat{K}})\le \text{ EBIC }_{\gamma }(s_{\tilde{K}}) \) implies

$$\begin{aligned} n\ln \dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}+(|\hat{K}| -|\tilde{K}|)(\ln n+2\gamma \ln p)\le 0. \end{aligned}$$
(17)

If we can show

$$\begin{aligned} P\left( n\ln \dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}-1}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}-|\tilde{K}|(\ln n+2\gamma \ln p)\le 0\right) \rightarrow 0, \end{aligned}$$
(18)

then we will have \(P(\hat{K}<\tilde{K})\rightarrow 0.\) In the following, we aim to prove (18):

$$\begin{aligned} \begin{aligned}&\sum \limits _{1\le i\le q}\left\{ \dfrac{\Vert ({I}-{H}_0(s_{\tilde{K}-1}))y_i\Vert _2^2}{n}-\dfrac{\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}{n}\right\} \\&\quad =\sum \limits _{1\le i\le q}\left\{ \left( {\beta }_{j_{\tilde{K}}i}\right) ^2A_{\tilde{K}-1}+2{\beta }_{j_{\tilde{K}}i}B_{\tilde{K}-1,i}+(A_{\tilde{K}-1})^{-1}(B_{\tilde{K}-1,i})^2\right\} , \end{aligned} \end{aligned}$$
(19)

And furthermore,

$$\begin{aligned} \sum \limits _{1\le i\le q}\dfrac{\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}{n} =\sum \limits _{1\le i\le q}\left\{ C_{\tilde{K},i}+1\right\} =q\left( 1+O_p\left( \sqrt{\dfrac{\ln p+\ln q}{n}}\right) \right) . \end{aligned}$$
(20)

Hence, with probability tending to 1,

$$\begin{aligned} n\ln \dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}-1}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2} \ge n\ln \left( 1+\dfrac{\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2^2}{q}\right) \ge Cq^{-1}n^{1-\kappa },\nonumber \\ \end{aligned}$$
(21)

for some \(0<C<1\) while \(|\tilde{K}|(\ln n+2\gamma \ln p)\le q^{-1}\sqrt{n\ln p}\). Combined with C5, (18) is thus proved.

(ii):

If \(\hat{K}>\tilde{K}\), \(\text{ EBIC }_{\gamma }(s_{\hat{K}})\le \text{ EBIC }_{\gamma }(s_{\tilde{K}}) \) implies,

$$\begin{aligned} \begin{aligned}&n\ln \left( 1+\dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2-\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}\right) \\&-(|\hat{K}| -|\tilde{K}|)(\ln n+2\gamma \ln p)\ge 0. \end{aligned} \end{aligned}$$
(22)

If we can prove that this inequality holds with a probability converging to 0, then \(P(\hat{K}>\tilde{K})=o(1).\) Note that

$$\begin{aligned} \begin{aligned} \sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2=&\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}})){E}_i\Vert _2^2\\ \sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2=&\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}})){E}_i\Vert _2^2\\ \end{aligned} \end{aligned}$$

From Lemma 2 in Luo and Chen (2013), we have

$$\begin{aligned} \max _{1\le i\le q}\dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2-\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}=\dfrac{2|\hat{K}-\tilde{K}|}{n}(1+o_p(1)), \end{aligned}$$

Hence, by applying the conclusions in (i),

$$\begin{aligned} \begin{aligned}&n\ln \left( 1+\dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2-\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}\right) \\&\quad \le 2|\hat{K}-\tilde{K}|\ln p. \end{aligned} \end{aligned}$$

When \(\gamma >1-\ln n/(2\ln p)\), the desired result is obtained.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, S. Variable selection in high-dimensional sparse multiresponse linear regression models. Stat Papers 61, 1245–1267 (2020). https://doi.org/10.1007/s00362-018-0989-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-0989-x

Keywords

Navigation