Variable selection in high-dimensional sparse multiresponse linear regression models

Luo, Shan

doi:10.1007/s00362-018-0989-x

Variable selection in high-dimensional sparse multiresponse linear regression models

Regular Article
Published: 23 February 2018

Volume 61, pages 1245–1267, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Shan Luo¹

427 Accesses
1 Citation
Explore all metrics

Abstract

We consider variable selection in high-dimensional sparse multiresponse linear regression models, in which a q-dimensional response vector has a linear relationship with a p-dimensional covariate vector through a sparse coefficient matrix $B\in R^{p\times q}$. We propose a consistent procedure for the purpose of identifying the nonzeros in B. The procedure consists of two major steps, where the first step focuses on the detection of all the nonzero rows in B, the latter aims to further discover its individual nonzero cells. The first step is an extension of Orthogonal Matching Pursuit (OMP) and the second step adopts the bootstrap strategy. The theoretical property of our proposed procedure is established. Extensive numerical studies are presented to compare its performances with available representatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models

Article 23 August 2018

Bayesian Variable Selection for Multi-response Linear Regression

Sparse reduced-rank regression with covariance estimation

Article 09 December 2014

References

Buehlmann P (2006) Boosting for high-dimensional linear models. Ann. Stat. 34(2):559–583
Article MathSciNet Google Scholar
Cai TT, Li H, Liu W, Xie J (2013) Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika 100(1):139–156
Article MathSciNet MATH Google Scholar
Chun H, Keleş S (2009) Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 182(1):79–90
Article Google Scholar
Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. 72(1):3–25
Article MathSciNet MATH Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann. Stat. 32(2):407–499
Article MathSciNet MATH Google Scholar
Ing C, Lai T (2011) A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Stat. Sin. 21(4):1473
Article MathSciNet MATH Google Scholar
Jia Z, Xu S (2007) Mapping quantitative trait loci for expression abundance. Genetics 176(1):611–623
Article Google Scholar
Johnsson T (1992) A procedure for stepwise regression analysis. Stat. Pap. 33(1):21–29
Article Google Scholar
Liu, J., Ma, S., Huang, J.: Penalized methods for multiple outcome data in genome-wide association studies. Technical report (2012)
Luo S, Chen Z (2013) Extended bic for linear regression models with diverging number of relevant features and high or ultra-high feature spaces. J. Stat. Plan. Infer. 143(3):494–504
Article MathSciNet MATH Google Scholar
Luo S, Chen Z (2014) Sequential lasso cum ebic for feature selection with ultra-high dimensional feature space. J. Am. Stat. Assoc. 109(507):1229–1240
Article MathSciNet MATH Google Scholar
Lutoborski A, Temlyakov V (2003) Vector greedy algorithms. J. Complex. 19(4):458–473
Article MathSciNet MATH Google Scholar
Ma S, Huang J, Song X (2011) Integrative analysis and variable selection with multiple high-dimensional data sets. Biostatistics 12(4):763–775
Article MATH Google Scholar
Mammen E (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann. Stat. 21(1):255–285
Article MathSciNet MATH Google Scholar
Obozinski G, Wainwright MJ, Jordan MI (2011) Support union recovery in high-dimensional multivariate regression. Ann. Stat. 39(1):1–47
Article MathSciNet MATH Google Scholar
Özkale MR (2015) Predictive performance of linear regression models. Stat. Pap. 56(2):531–567
Article MathSciNet MATH Google Scholar
Peng J, Zhu J, Bergamaschi A, Han W, Noh D-Y, Pollack JR, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4(1):53–77
Article MathSciNet MATH Google Scholar
Rothe G (1986) Some remarks on bootstrap techniques for constructing confidence intervals. Stat. Pap. 27(1):165–172
MathSciNet Google Scholar
Similä, T., Tikka, J.: Common subset selection of inputs in multiresponse regression. In: Neural Networks, 2006. IJCNN’06. International Joint Conference on. IEEE, pp. 1908–1915 (2006)
Similä T, Tikka J (2007) Input selection and shrinkage in multiresponse linear regression. Comput. Stat. Data Anal. 52(1):406–422
Article MathSciNet MATH Google Scholar
Temlyakov VN (2000) Weak greedy algorithms. Adv. Comput. Math. 12(2):213–227
Article MathSciNet MATH Google Scholar
Tropp J, Gilbert A, Strauss M (2006) Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process. 86(3):572–588
Article MATH Google Scholar
Turlach B, Venables W, Wright S (2005) Simultaneous variable selection. Technometrics 47(3):349–363
Article MathSciNet Google Scholar
Wang, J.: Joint estimation of sparse multivariate regression and conditional graphical models. Stat. Sin. pp. 831–851 (2015)
Yang, C., Wang, L., Zhang, S., Zhao, H.: Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping. Bioinformatics (2013)

Download references

Author information

Authors and Affiliations

Department of Statistics, School of Mathematical Sciences, Shanghai Jiao Tong University, 800 Dongchuan RD, Shanghai, 200240, China
Shan Luo

Authors

Shan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shan Luo.

Additional information

This research was supported by National Natural Science Foundation of China (NSFC): 11401378 and Shanghai Jiao Tong University start-up fund for special researchers: WF220407103.

Appendix: Technical proofs

In this section, we provide technical proofs of our main theorems. For the convenience of the reader, we restate some useful notations and conditions.

For given K, if there exists a $k\le K$ such that ${S}_{0}\subseteq s_k$, define

$$\begin{aligned} \tilde{K}=\min \left\{ k:1\le k\le K, {S}_{0}\subseteq s_k\right\} , \end{aligned}$$

(7)

otherwise, $\tilde{K}$ is defined to be K.

We assume the following conditions:

(C1):

$\ln p=o(n^{1/3})$;

(C2):

The predictor vector ${x}$ and the error vector ${e}$ satisfy

(C2.1):: the covariates in ${x}$ have a constant variance 1 and correlations bounded from 0 and 1.
(C2.2):: $\sigma _{\max }\equiv \max _{1 \le j,k\le p} \sigma ({x}_j{x}_k) < \infty $ where $\sigma ({x}_j{x}_k)$ denotes the standard deviation of ${x}_j{x}_k$.
(C2.3):: $\max _{1\le j,k\le p}{E}\exp (t{x}_j{x}_k) $ and $\max _{1\le i\le q,1\le j\le p}{E}\exp (t{x}_j{e}^i) $ are finite for t in a neighborhood of zero.

(C3):

$\sum _{1\le i\le p}\sum _{1\le j\le q}|{\beta }_{ij}|\le c$ where c is a positive constant.

(C4):

There exists a constant $\delta >0$ such that

$$\begin{aligned} \min _{1\le |A|\le K}\lambda _{\min }(\varGamma _{A,A})\ge \delta \end{aligned}$$

(8)

for K satisfying $K=O\left( q^{-1}\sqrt{n/\ln p}\right) .$

(C5):

For the K in C4, there exists a $0<\kappa <1$ satisfying $n^{-\kappa }K\rightarrow +\,\infty , n^{1-2\kappa }/\ln p\rightarrow +\,\infty $ and

$$\begin{aligned} \lim \inf _{n\rightarrow +\infty }n^{\kappa }\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2^2>0. \end{aligned}$$

Denote $\tilde{{y}}_{A,i}={x}_{A}\varGamma _{A,A}^{-1}\text{ E }(x_{A}^{\top }y_i),\;\; \hat{{y}}_{A,i}={x}_{A}\hat{\varGamma }_{A,A}^{-1}X_{A}^{\top }Y_i$ and $\tilde{{y}}_{k,i}=\tilde{{y}}_{s_k,i}, \hat{{y}}_{k,i}=\hat{{y}}_{s_k,i}$.

Theorem 1

Under assumptions C1–C5, we have

$$\begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\hat{{y}}_{k,i})^2|(Y,X)\right) =O_p\left( \dfrac{1}{m}+q\sqrt{\dfrac{\ln p}{n}}+\dfrac{m\ln p}{n}\right) . \end{aligned}$$

When $m=O(q^{-1}\sqrt{n/\ln p})$, the right hand side is $O_p(m^{-1}).$

Proof of Theorem 1

Denote

$$\begin{aligned} \text{ I }= \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{k,i})^2|(Y,X)\right) ,\;\; \text{ II }= \sum \limits _{1\le i\le q}{E}\left( (\hat{{y}}_{k,i}-\tilde{{y}}_{k,i})^2|(Y,X)\right) . \end{aligned}$$

Firstly, we focus on $\text{ I }$. For $A\subseteq \{1,2,\ldots ,p\},1\le i\le q, 1\le j\le p,$ denote by $E_i$ the ith column of the error matrix E and

$$\begin{aligned} {\mu }_{A,j,i}={E}\left( {x}_j(y_i-\tilde{{y}}_{A,i})|A\right) ,\;\;\hat{{\mu }}_{A,j,i}=\left( \dfrac{1}{n}X_j^{\top }X_j\right) ^{-1/2}\left( \dfrac{1}{n}X_j^{\top }\left( {I}-{H}_0(A)\right) y_i\right) ,\nonumber \\ \end{aligned}$$

(9)

for simplicity, denote them by ${\mu }_{k,j,i}, \hat{{\mu }}_{k,j,i}$ when $A=s_k$. Since ${E}\left( {x}_{s_k}^{\top }\left( y_i-\tilde{{y}}_{k,i}\right) |(Y,X)\right) =0$, therefore, for any k,

$$\begin{aligned} \begin{aligned} \text{ I }=&\sum \limits _{1\le i\le q}{E}\left( \sum _{j\notin s_k}\left( y_i-\tilde{{y}}_{k,i}\right) {x}_j{\beta }_{ji}|(Y,X)\right) = \sum _{j\notin s_k}\left( \sum \limits _{1\le i\le q}{\mu }_{k,j,i}{\beta }_{ji}\right) \\ \le&\left[ \max \limits _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k,j,i}\right) ^2\right) ^{1/2}\right] \times \left( \sum _{j\notin s_k}\left( \sum \limits _{1\le i\le q}\left( {\beta }_{ji}\right) ^2\right) ^{1/2}\right) . \end{aligned} \end{aligned}$$

(10)

Now it suffices to estimate the first term of the right hand side in the last inequality. Note that by definition of $\hat{j}_k$,

$$\begin{aligned} \sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{k-1,j_k,i}\right) ^2\ge \max _{1\le j\le p}\sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{k-1,j,i}\right) ^2. \end{aligned}$$

Hence, triangle inequality implies

$$\begin{aligned}&\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j_k,i}\right) ^2\right) ^{1/2}\ge -\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}^i_{k-1,j_k,i}-{\mu }_{k-1,j_k,i}\right) ^2\right) ^{1/2}\nonumber \\&\qquad +\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}^i_{k-1,j_k,i}\right) ^2\right) ^{1/2}\nonumber \\&\quad \ge -\sqrt{q}\max _{|A|\le k-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|+\max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{k-1,j,i}\right) ^2\right) ^{1/2}\nonumber \\&\quad \ge -\sqrt{q}\max _{|A|\le k-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j}^i-{\mu }_{A,j}^i|\nonumber \\&\qquad -\max _{|A|\le k-1,j\notin A}\left( \sum \limits _{1\le i\le q}\left( \hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}\right) ^2\right) ^{1/2} +\max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j,i}\right) ^2\right) ^{1/2}\nonumber \\&\quad \ge -2\sqrt{q}\max _{|A|\le k-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|+\max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j,i}\right) ^2\right) ^{1/2},\nonumber \\ \end{aligned}$$

(11)

on the other hand,

$$\begin{aligned} \left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j_k,i}\right) ^2\right) ^{1/2}\le \max _{1\le j\le p}\left( \sum \limits _{1\le i\le q}\left( {\mu }_{k-1,j,i}\right) ^2\right) ^{1/2}. \end{aligned}$$

For any $0<\xi <1$ and $C>0$, let $\tilde{\xi }=2/(1-\xi )$ and define

$$\begin{aligned} A_n(m)= & {} \left\{ \max _{|A|\le m-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|\le C\sqrt{\dfrac{\ln p}{n}}\right\} ,\\ B_n(m)= & {} \left\{ \min _{0\le k\le m-1}\max _{1\le j\le p}\left( \sum _{1\le i\le q}\left( \mu ^{i}_{s_k,j}\right) ^2\right) ^{1/2}>\tilde{\xi }C\sqrt{\dfrac{q\ln p}{n}}\right\} , \end{aligned}$$

then

$$\begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{k,i})^2I(A_n(m)\cap B_n^c(m))|(Y,X)\right)= & {} O(q\sqrt{\ln p/n}),\\ \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{k,i})^2I(A_n(m)\cap B_n(m))|(Y,X)\right)= & {} O(m^{-1}), \end{aligned}$$

the second inequality is implied by Theorem 3 in Temlyakov (2000). By direct computation, we have

$$\begin{aligned} \hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i} =\dfrac{n^{-1}X_j^{\top }\left( {I}-{H}_0(A)\right) {E}_i}{(n^{-1}X_j^{\top }X_j)^{1/2}}+\sum \limits _{r\notin A}{\beta }_{ri}\left\{ \dfrac{\hat{\varGamma }_{jr|A}}{(n^{-1}X_i^{\top }X_i)^{1/2}}-\varGamma _{jr|A}\right\} . \end{aligned}$$

From Lemma 1 in Luo and Chen (2014), under C1 to C5, when $\max (\ln m,\ln q)=O(\ln p)$, we have

$$\begin{aligned} \max _{1\le j\le p}|n^{-1}X_j^{\top }X_j-1|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) ,\max _{|A|\le m,j,r\notin A}|\hat{\varGamma }_{jr|A}- \varGamma _{jr|A}|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) , \end{aligned}$$

and also,

$$\begin{aligned} \max _{|A|\le m,j,r\notin A}|n^{-1}X_j^{\top }\left( {I}-{H}_0(A)\right) {E}_i|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) . \end{aligned}$$

Consequently,

$$\begin{aligned} \max _{|A|\le m-1,j\notin A,1\le i\le q}|\hat{{\mu }}_{A,j,i}-{\mu }_{A,j,i}|=O_p\left( \sqrt{\dfrac{\ln p}{n}}\right) . \end{aligned}$$

(12)

Now we focus on $\text{ II }$. Note that

$$\begin{aligned} \hat{{y}}_{k,i}-\tilde{{y}}_{k,i}={x}_A\left( \hat{\varGamma }_{s_k,s_k}^{-1}X_{s_k}^{\top }Y_i-\varGamma _{s_k,s_k}^{-1}\text{ E }({x}_{s_k}y_i)\right) \end{aligned}$$

from the above discussions, we can see that all components in $\hat{\varGamma }_{s_k,s_k}^{-1}X_{s_k}^{\top }Y_i-\varGamma _{s_k,s_k}^{-1}\text{ E }({x}_{s_k}y_i)$ is uniformly $O_p(\sqrt{\ln p/n})$. Therefore, $\text{ II }=O_p\left( m \ln p/n\right) .$ The desired result is obtained. $\square $

Theorem 2

Under assumptions C1-C5, the MOMP posses sure screening property, that is,

$$\begin{aligned} P\left( {S}_{0}\subset s_{K}\right) \rightarrow 1\;\text {as}\; n\rightarrow +\,\infty \end{aligned}$$

for K defined in C4.

Proof of Theorem 2

Let $\tilde{{\beta }}_{ji}(A)$ be the coefficient of ${x}_A$ in the best linear predictor $\tilde{{y}}_{A,i}$ as defined in (9) and $\tilde{{\beta }}_{ji}(A)$ be 0 if $j\notin A$. Note that

$$\begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{m,i})^2|(Y,X)\right) =\sum \limits _{1\le i\le q}\text{ E }\left( \left( \sum \limits _{1\le j\le p}(\beta _{ji}-\tilde{{\beta }}_{ji}(s_m)){x}_j\right) ^2|(Y,X)\right) . \end{aligned}$$

The inequality

$$\begin{aligned} \sum \limits _{1\le i\le q}\sum \limits _{1\le j\le p}|{\beta }_{ji}|\ge |{S}_{0}|\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _1\ge |{S}_{0}|\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2 \end{aligned}$$

(13)

and C3, C5 implies that $|{S}_{0}|=O(n^{\kappa /2})$, yielding $|{S}_{0}\cup s_m| = O(m+n^{\kappa /2})$ and it follows from the above inequality that, if $s_m^c\cap {S}_{0}\ne \emptyset $ and $m=K$, then

$$\begin{aligned} \begin{aligned} \sum \limits _{1\le i\le q}{E}\left( (y_i-\tilde{{y}}_{m,i})^2|(Y,X)\right) \ge&\, {E}\sum \limits _{1\le i\le q}\sum \limits _{j\in s_m^c\cap {S}_{0}}\left( {\beta }_{ji}{x}_j\right) ^2\\ \ge&\sum \limits _{1\le i\le q}\sum \limits _{j\in s_m^c\cap {S}_{0}}\left( {\beta }_{ji}\right) ^2\lambda _{\min }\left( \varGamma _{s_m^c\cap S_{0},s_m^c\cap S_{0}}\right) \\ \ge \,&\delta \min \limits _{j\in S_{0}}\Vert \beta _j\Vert _2^2\ge Cn^{-\kappa } \end{aligned} \end{aligned}$$

for some positive constant C when n is sufficiently large. From C5, $mn^{-\kappa }\rightarrow +\infty $, this contradicts with Theorem 2. Therefore, $P\left( {S}_{0}\subseteq s_{K}\right) \rightarrow 1$ as $n\rightarrow \infty .$$\square $

Theorem 3

Under the assumptions C1–C5, when ${e}$ follows a multivariate normal distribution, we have

$$\begin{aligned} \lim _{n\rightarrow +\infty } P(\hat{K}=\tilde{K})=1\;\text {where}\;\tilde{K}=\min \{1\le k\le K: S_{0}\subset s_k\} \end{aligned}$$

if $\gamma $ in (2) is larger than $1-\ln n/2\ln p$.

Proof of Theorem 3

Theorem 2 implies that there exists a constant $a>0$ such that

$$\begin{aligned} \lim _{n\rightarrow +\infty }P\left( \tilde{K}\le aq^{-1}\sqrt{n/\ln p}\right) =1 \end{aligned}$$

(14)

Suppose $j\notin A,A_1\subsetneq A_2$, the following two identities

$$\begin{aligned} {I}- {H}_0(A\cup \{j\})= & {} [ {I}- {H}_0(A)]\left( {I}- \frac{ X_{j} X_j^{\top }[ {I}- {H}_0(A)] }{ X_j^{\top }[ {I}- {H}_0(A)]X_j} \right) ,\nonumber \\ {H}_0(A_2)-{H}_0(A_1)= & {} ({I}-{H}_0(A_1))X_{A_2\cap A_1^c}\{X^{\top }_{A_2\cap A_1^c}({I}-{H}_0(A_1))X_{A_2\cap A_1^c}\}^{-1}X^{\top }_{A_2\cap A_1^c}\nonumber \\&({I}-{H}_0(A_1)), \end{aligned}$$

(15)

are very important to prove this theorem.

Without loss of generality, we assume that all features and errors $e_i$ have sample mean 0 and sample variance 1. Define

$$\begin{aligned} \begin{aligned} A_{k-1}=&n^{-1}X^{\top }_{j_{k}}\left( {I}-{H}_0(s_{k-1})\right) X_{j_k}\\ B_{k-1,i}=&n^{-1}X^{\top }_{j_{k}}\left( {I}-{H}_0(s_{k-1})\right) E_i\\ C_{k-1,i}=&n^{-1}E_i^{\top }\left( {I}-{H}_0(s_{k-1})\right) E_i-1, \end{aligned} \end{aligned}$$

From Lemma 1 in Luo and Chen (2014), it is straightforward to have the following conclusions,

(A):

On one hand, $\max \limits _{1\le k\le n}|A_{k-1}-\varGamma _{j_k,j_k|s_{k-1}}|=o_p(1)$; on the other hand,

$$\begin{aligned} A_{k-1}\ge & {} \lambda _{\min }\left( \hat{\varGamma }_{s_k,s_k}\right) \left[ 1+\Vert \left( X^{\top }_{s_{k-1}}X_{s_{k-1}}\right) ^{-1}X^{\top }_{s_{k-1}}X_{j_{k}}\Vert _2^2\right] \\\ge & {} \lambda _{\min }\left( \hat{\varGamma }_{s_k,s_k}\right) \end{aligned}$$

When $\lambda _{\min }\left( \varGamma _{s_k,s_k}\right) \ge \delta $, for any k-dimensional unit vector ${w}$,

$$\begin{aligned} \begin{aligned} \min \left( {w}^{\top }\hat{\varGamma }_{s_k,s_k}{w}\right) =&\min \left( {w}^{\top }\left\{ \left[ \hat{\varGamma }_{s_k,s_k}-\varGamma _{s_k,s_k}\right] +\varGamma _{s_k,s_k}\right\} {w}\right) \\ \ge&\min \left( {w}^{\top }\varGamma _{s_k,s_k}{w}\right) -\max \left( {w}^{\top }\left[ \hat{\varGamma }_{s_k,s_k}-\varGamma _{s_k,s_k}\right] {w}\right) \\ \ge&\lambda _{\min }\left( \varGamma _{s_k,s_k}\right) -\Vert {w}\Vert _1^2\max _{1\le i,j\le p}|n^{-1}X_i^{\top }X_j-\varGamma _{i,j}|\\ \ge&\delta -O_p\left( k\sqrt{\dfrac{\ln p}{n}}\right) . \end{aligned} \end{aligned}$$

(16)

That is, $P(A_{k-1}\ge \delta )\rightarrow 1$ as $n\rightarrow +\infty $ provided $k\sqrt{\ln p/n}=o(1).$

(B):

$\max \limits _{k=O(q^{-1}\sqrt{n/\ln p}),1\le i\le q}|B_{k-1,i}|=O_p\left( \sqrt{\dfrac{\ln p+\ln q}{n}}\right) =o_p\left( \min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2\right) .$

(C):

$\max \limits _{k=O(q^{-1}\sqrt{n/\ln p}),1\le i\le q}|C_{k-1,i}|=O_p\left( \sqrt{\dfrac{\ln p+\ln q}{n}}\right) =o_p\left( \min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2\right) .$

(i):

If $\hat{K}<\tilde{K}$, $\text{ EBIC }_{\gamma }(s_{\hat{K}})\le \text{ EBIC }_{\gamma }(s_{\tilde{K}}) $ implies

$$\begin{aligned} n\ln \dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}+(|\hat{K}| -|\tilde{K}|)(\ln n+2\gamma \ln p)\le 0. \end{aligned}$$

(17)

If we can show

$$\begin{aligned} P\left( n\ln \dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}-1}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}-|\tilde{K}|(\ln n+2\gamma \ln p)\le 0\right) \rightarrow 0, \end{aligned}$$

(18)

then we will have $P(\hat{K}<\tilde{K})\rightarrow 0.$ In the following, we aim to prove (18):

$$\begin{aligned} \begin{aligned}&\sum \limits _{1\le i\le q}\left\{ \dfrac{\Vert ({I}-{H}_0(s_{\tilde{K}-1}))y_i\Vert _2^2}{n}-\dfrac{\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}{n}\right\} \\&\quad =\sum \limits _{1\le i\le q}\left\{ \left( {\beta }_{j_{\tilde{K}}i}\right) ^2A_{\tilde{K}-1}+2{\beta }_{j_{\tilde{K}}i}B_{\tilde{K}-1,i}+(A_{\tilde{K}-1})^{-1}(B_{\tilde{K}-1,i})^2\right\} , \end{aligned} \end{aligned}$$

(19)

And furthermore,

$$\begin{aligned} \sum \limits _{1\le i\le q}\dfrac{\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2}{n} =\sum \limits _{1\le i\le q}\left\{ C_{\tilde{K},i}+1\right\} =q\left( 1+O_p\left( \sqrt{\dfrac{\ln p+\ln q}{n}}\right) \right) . \end{aligned}$$

(20)

Hence, with probability tending to 1,

$$\begin{aligned} n\ln \dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}-1}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2} \ge n\ln \left( 1+\dfrac{\min \limits _{j\in {S}_{0}}\Vert {\beta }_j\Vert _2^2}{q}\right) \ge Cq^{-1}n^{1-\kappa },\nonumber \\ \end{aligned}$$

(21)

for some $0<C<1$ while $|\tilde{K}|(\ln n+2\gamma \ln p)\le q^{-1}\sqrt{n\ln p}$. Combined with C5, (18) is thus proved.

(ii):

If $\hat{K}>\tilde{K}$, $\text{ EBIC }_{\gamma }(s_{\hat{K}})\le \text{ EBIC }_{\gamma }(s_{\tilde{K}}) $ implies,

$$\begin{aligned} \begin{aligned}&n\ln \left( 1+\dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2-\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}\right) \\&-(|\hat{K}| -|\tilde{K}|)(\ln n+2\gamma \ln p)\ge 0. \end{aligned} \end{aligned}$$

(22)

If we can prove that this inequality holds with a probability converging to 0, then $P(\hat{K}>\tilde{K})=o(1).$ Note that

$$\begin{aligned} \begin{aligned} \sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2=&\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}})){E}_i\Vert _2^2\\ \sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2=&\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}})){E}_i\Vert _2^2\\ \end{aligned} \end{aligned}$$

From Lemma 2 in Luo and Chen (2013), we have

$$\begin{aligned} \max _{1\le i\le q}\dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2-\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}=\dfrac{2|\hat{K}-\tilde{K}|}{n}(1+o_p(1)), \end{aligned}$$

Hence, by applying the conclusions in (i),

$$\begin{aligned} \begin{aligned}&n\ln \left( 1+\dfrac{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\tilde{K}}))y_i\Vert _2^2-\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}{\sum \limits _{1\le i\le q}\Vert ({I}-{H}_0(s_{\hat{K}}))y_i\Vert _2^2}\right) \\&\quad \le 2|\hat{K}-\tilde{K}|\ln p. \end{aligned} \end{aligned}$$

When $\gamma >1-\ln n/(2\ln p)$, the desired result is obtained.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, S. Variable selection in high-dimensional sparse multiresponse linear regression models. Stat Papers 61, 1245–1267 (2020). https://doi.org/10.1007/s00362-018-0989-x

Download citation

Received: 08 February 2017
Revised: 31 January 2018
Published: 23 February 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00362-018-0989-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection in high-dimensional sparse multiresponse linear regression models

Abstract

Access this article

Similar content being viewed by others

A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models

Bayesian Variable Selection for Multi-response Linear Regression

Sparse reduced-rank regression with covariance estimation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Technical proofs

Theorem 1

Proof of Theorem 1

Theorem 2

Proof of Theorem 2

Theorem 3

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable selection in high-dimensional sparse multiresponse linear regression models

Abstract

Access this article

Similar content being viewed by others

A two-stage sequential conditional selection approach to sparse high-dimensional multivariate regression models

Bayesian Variable Selection for Multi-response Linear Regression

Sparse reduced-rank regression with covariance estimation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Technical proofs

Appendix: Technical proofs

Theorem 1

Proof of Theorem 1

Theorem 2

Proof of Theorem 2

Theorem 3

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation