Skip to main content
Log in

Variable selection for sparse logistic regression

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

We consider the variable selection problem in a sparse logistical regression model. Inspired by the square-root Lasso, we develop a weighted score Lasso for logistical regression. The new method yields the estimation \({\ell }_1\) error bound under similar assumptions as introduced in Bach et al. (Electron J Stat 4:384–414, 2010). Compared to standard Lasso, the weighted score Lasso provides a direct choice for the tuning parameter. Both theoretical and simulation results confirm the satisfactory performance of the proposed method. We illustrate our methodology with a real microarray data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bach F et al (2010) Self-concordant analysis for logistic regression. Electron J Stat 4:384–414

    Article  MathSciNet  Google Scholar 

  • Belloni A, Chernozhukov V, Wang L (2011) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806

    Article  MathSciNet  Google Scholar 

  • Bickel PJ, Ritov Y, Tsybakov AB (2009) Simultaneous analysis of lasso and dantzig selector. Ann Stat 37:1705–1732

    Article  MathSciNet  Google Scholar 

  • Blazere M, Loubes J-M, Gamboa F (2014) Oracle inequalities for a group lasso procedure applied to generalized linear models in high dimension. IEEE Trans Inf Theory 60(4):2303–2318

    Article  MathSciNet  Google Scholar 

  • Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin

    Book  Google Scholar 

  • Bunea F et al (2008) Honest variable selection in linear and logistic regression models via \(\ell _1\) and \(\ell _1+ \ell _2\) penalization. Electron J Stat 2:1153–1194

    Article  MathSciNet  Google Scholar 

  • Candes E, Tao T (2007) The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann Stat 35:2313–2351

    Article  MathSciNet  Google Scholar 

  • Dettling M (2004) Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593

    Article  Google Scholar 

  • Dobson AJ, Barnett A (2008) An introduction to generalized linear models. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  • Huang Y, Wang C (2001) Consistent functional methods for logistic regression with errors in covariates. J Am Stat Assoc 96(456):1469–1482

    Article  MathSciNet  Google Scholar 

  • Huang J, Ma S, Zhang C-H (2008) The iterated \(l\)asso for high-dimensional logistic regression. The University of Iowa, Department of Statistics and Actuarial Sciences

  • Kwemou M (2016) Non-asymptotic oracle inequalities for the lasso and group lasso in high dimensional logistic model. ESAIM Probab Stat 20:309–331

    Article  MathSciNet  Google Scholar 

  • Lee SI, Lee H, Abbeel P, Ng AY (2014) Efficient \(l_1\) regularized logistic regression. In: National conference on artificial intelligence

  • Loh P-L, Wainwright MJ (2013) Regularized \(m\)-estimators with nonconvexity: statistical and algorithmic theory for local optima. In Advances in neural information processing systems, pp 476–484

  • Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J R Stat Soc Ser B (Stat Methodol) 70(1):53–71

    Article  MathSciNet  Google Scholar 

  • Negahban S, Ravikumar P, Wainwright MJ, Yu B (2011) A unified framework for high-dimensional analysis of \(m\)-estimators with decomposable regularizers. In: Advances in neural information processing systems (NIPS)

  • Negahban SN, Ravikumar P, Wainwright MJ, Yu B et al (2012) A unified framework for high-dimensional analysis of \( m \)-estimators with decomposable regularizers. Stat Sci 27(4):538–557

    Article  MathSciNet  Google Scholar 

  • Sakhanenko AI (1991) Berry-Esseen type estimates for large deviation probabilities. Sib Math J 32(4):647–656

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Van de Geer S (2007) The deterministic lasso. Seminar für Statistik, Eidgenössische Technische Hochschule (ETH) Zürich

  • Van de Geer SA (2008) High-dimensional generalized linear models and the lasso. Ann Stat 36:614–645

    Article  MathSciNet  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank anonymous referees and an associate editor for their helpful and constructive comments to improve this manuscript a lot. This work was supported by GJJ160927.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zanhua Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 1

Let \(\delta ={\hat{\beta }}-\beta ^0\). Recall that \(I=\{j: \beta ^0_j \ne 0\}\). For convenience, we write \({\ell }_{\omega }\left( \beta \right) \) to denote \({\ell }_{\omega }\left( \beta ; X, Y\right) \), and similarly, \(\nabla {\ell }_{\omega }\left( \beta \right) =\nabla {\ell }_{\omega }\left( \beta ; X, Y\right) \) and \(\nabla ^2{\ell }_{\omega }\left( \beta \right) =\nabla ^2{\ell }_{\omega }\left( \beta ; X, Y\right) \) . By the definition of the estimator \({\hat{\beta }}\), we have

$$\begin{aligned} {\ell }_{\omega }\left( {\hat{\beta }}\right) -{\ell }_{\omega }\left( \beta ^0\right) \le \lambda \left( \Vert \beta ^0\Vert _1-\Vert {\hat{\beta }}\Vert _1\right) \le \lambda \Vert \delta _I\Vert _1-\lambda \Vert \delta _{I^c}\Vert _1. \end{aligned}$$
(14)

Since \({\ell }_{\omega }\left( \beta \right) \) is a convex function, we obtain

$$\begin{aligned} {\ell }_{\omega }\left( {\hat{\beta }}\right) -{\ell }_{\omega }\left( \beta ^0\right) \ge \delta ^T\nabla {\ell }_{\omega }\left( \beta ^0\right) \ge -\Vert \nabla {\ell }_{\omega }\left( \beta ^0\right) \Vert _{\infty }\Vert \delta \Vert _1. \end{aligned}$$
(15)

Define the event

$$\begin{aligned} A=\left\{ \Vert \nabla {\ell }_{\omega }\left( \beta ^0\right) \Vert _{\infty }\le c\lambda \right\} ,~ \text {for}~ 0<c<1. \end{aligned}$$
(16)

Combining (14) and (15), on the event A we have

$$\begin{aligned} \Vert \delta _{I^c}\Vert _1\le \frac{1+c}{1-c}\Vert \delta _{I}\Vert _1. \end{aligned}$$

Therefore, on the event A we have \(\delta \in \triangle _{\alpha }\), for \(\alpha =\frac{1+c}{1-c}\).

Define the function \(g(t)={\ell }_{\omega }(\beta ^0+t\delta )\). Following assumptions (A1) and (A4), we have

$$\begin{aligned}&|g^{'''}(t)|\le c_0\max _{1\le i\le n}\mid x^T_i\delta \mid g^{''}(t)\\&\le c_0\left( \sup _{1\le i\le n, 1\le j\le p}\mid x_{ij}\mid \right) \Vert \delta \Vert _1g^{''}(t) \le c_0R\Vert \delta \Vert _1g^{''}(t), \end{aligned}$$

and invoke the condition \(\delta \in \triangle _{\alpha }\) to obtain

$$\begin{aligned} |g^{'''}(t)|\le c_0R(\alpha +1)\Vert \delta _I\Vert _1g^{''}(t)\le c_0R(\alpha +1)\sqrt{s}\Vert \delta _I\Vert _2g^{''}(t). \end{aligned}$$
(17)

Denote \({\bar{R}}=c_0R(\alpha +1)\sqrt{s}\), then, by Proposition 1 of Bach et al. (2010), we have

$$\begin{aligned} {\ell }_{\omega }({\hat{\beta }})-{\ell }_{\omega }(\beta ^0)\ge & {} \delta ^T\nabla {\ell }_{\omega }(\beta ^0)+\frac{\delta ^T\nabla ^2{\ell }_{\omega }(\beta ^0)\delta }{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}\left( e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1\right) \nonumber \\\ge & {} -c\lambda \Vert \delta \Vert _1+\frac{\delta ^T\nabla ^2{\ell }_{\omega }(\beta ^0)\delta }{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}\left( e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1\right) . \end{aligned}$$
(18)

Combining (14) and (18), on the event A we obtain

$$\begin{aligned} \frac{\delta ^T\nabla ^2{\ell }_{\omega }(\beta ^0)\delta }{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}(e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1)\le c\lambda \Vert \delta \Vert _1+\lambda \Vert \delta _{I}\Vert _1-\lambda \Vert \delta _{I^c}\Vert _1. \end{aligned}$$

Adding \((1-c)\lambda \Vert \delta \Vert _1\) to both sides of above inequality and invoking restricted eigenvalue condition (A5), we also have

$$\begin{aligned} \frac{\rho }{{\bar{R}}^2}\left( e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1\right) +(1-c)\lambda \Vert \delta \Vert _1 \le 2\lambda \Vert \delta _I\Vert _1. \end{aligned}$$
(19)

Using the fact \(\Vert \delta _I\Vert _1\le \sqrt{s}\Vert \delta _I\Vert _2\), we have

$$\begin{aligned} e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1 \le \frac{2\lambda {\bar{R}}^2\sqrt{s}}{\rho }\Vert \delta _I\Vert _2. \end{aligned}$$
(20)

With a short calculation, we obtain, for all \(t\in [0, 1)\), that

$$\begin{aligned} \exp \left( \frac{-2t}{1-t}\right) +(1-t)\frac{2t}{1-t}-1 \ge 0. \end{aligned}$$

Set \(t={\bar{R}}\Vert \delta _I\Vert _2/\left( 2+{\bar{R}}\Vert \delta _I\Vert _2\right) \), then we have

$$\begin{aligned} e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1 \ge \frac{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}{2+{\bar{R}}\Vert \delta _I\Vert _2}. \end{aligned}$$

This implies using (20) that

$$\begin{aligned} \frac{\Vert \delta _I\Vert _2}{2+{\bar{R}}\Vert \delta _I\Vert _2} \le \frac{2\lambda \sqrt{s}}{\rho }. \end{aligned}$$

If \(\lambda s < \frac{c(1-c)\rho }{4c_0R}\), we have \({\bar{R}}\Vert \delta _I\Vert _2\le \frac{2c}{1-c}\) and consequently

$$\begin{aligned} e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1 \ge \frac{(1-c){\bar{R}}^2}{2}\Vert \delta _I\Vert ^2_2. \end{aligned}$$
(21)

Combining (19) and (21), we obtain

$$\begin{aligned} \frac{(1-c)\rho }{2}\Vert \delta _I\Vert ^2_2+(1-c)\lambda \Vert \delta \Vert _1 \le 2\lambda \Vert \delta _I\Vert _1. \end{aligned}$$

Using the inequality \(2uv \le au^2+v^2/a\), for any \(a>1\), we further obtain

$$\begin{aligned} \frac{(1-c)\rho }{2}\Vert \delta _I\Vert ^2_2+(1-c)\lambda \Vert \delta \Vert _1 \le a\lambda ^2 s+\frac{1}{a}\Vert \delta _I\Vert ^2_2. \end{aligned}$$

By taking \(a=\frac{2}{(1-c)\rho }\), then we obtain, on the event A, that

$$\begin{aligned} \Vert \delta \Vert _1\le \frac{2}{\rho (1-c)^2}\lambda s. \end{aligned}$$

To conclude the proof we determine now \(\lambda =(\sqrt{n}c)^{-1}C(\beta ^0)\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) \) such that \({\mathbb {P}}(A^c)\le \epsilon (1+o(1))\).

Denote \(\xi _{ij}=\omega (x^T_i\beta ^0)\left[ F(x_i^T\beta ^0)-Y_i\right] x_{ij}\), then \({\mathbb {E}}(\xi _{ij})=0\) and \({\mathbb {E}}(\xi _{ij}^2)=\omega ^2(x^T_i\beta ^0)F(x_i^T\beta ^0)(1-F(x_i^T\beta ^0)x_{ij}^2\). Denote \(b=\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) \), then, by (16), we have

$$\begin{aligned} {\mathbb {P}}(A^c)= & {} {\mathbb {P}}\left\{ \left\| \nabla {\ell }_{\omega }(\beta ^0)\right\| _{\infty }>c\lambda \right\} \\= & {} {\mathbb {P}}\left\{ \underset{1\le j\le p}{\max }\left| \frac{1}{n}\sum _{i=1}^{n}\left\{ \omega (x^T_i\beta ^0)\left[ F(x_i^T\beta ^0)-Y_i\right] x_{ij}\right\} \right|> c\lambda \right\} \\\le & {} p \underset{1\le j\le p}{\max }{\mathbb {P}}\left\{ \left| \frac{1}{n}\sum _{i=1}^{n}\left\{ \omega (x^T_i\beta ^0)\left[ F(x_i^T\beta ^0)-Y_i\right] x_{ij}\right\} \right|> c\lambda \right\} \\= & {} p \underset{1\le j\le p}{\max }{\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\xi _{ij}\right| > \sqrt{n}C(\beta ^0)b\right\} . \end{aligned}$$

We now use Lemma 6 to estimate \({\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\xi _{ij}\right| > \sqrt{n}C(\beta ^0)b\right\} \).

Lemma 6

(Sakhanenko type moderate deviation theorem (Sakhanenko 1991)) Let \(Z_1,\cdots , Z_n\) be independent random variables with \({\mathbb {E}}(Z_i)=0\) and \(\mid Z_i\mid <1\) for all \(1\le i \le n\). Denote \(B_n^2=\sum _{i=1}^{n}{\mathbb {E}}(Z_i^2)\) and \(L_n=\sum _{i=1}^{n}{\mathbb {E}}( \mid Z_i\mid ^3)/B_n^3\). Then there exists a positive constant A such that for all \(x\in [1, \frac{1}{A}\min \{B_n, L_n^{-1/3}\}]\)

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^{n}Z_i> xB_n\right) =(1+O(1)x^3L_n)(1-\Phi (x)). \end{aligned}$$

Since \(Y_i\in \{0, 1\}\) and \({\mathbb {P}}(Y_i=1)=F(x_i^{T}\beta ^0)\le 1\), then, with assumption A1 we have

$$\begin{aligned} \mid \xi _{ij}\mid \le \left( \underset{1\le i\le n, 1\le j\le p}{\sup }\mid x_{ij}\mid \right) (\mid \omega (x^T_i\beta ^0)\mid )(\mid F(x_i^T\beta ^0)-Y_i\mid )\le RK_1, \end{aligned}$$

with a positive constant \(K_1=\underset{1\le i\le n}{\sup }\omega (x^T_i\beta ^0)\).

Let \(Z_{ij}=\xi _{ij}/(RK_1)\), then we have \({\mathbb {E}}Z_{ij}=0\), \(\mid Z_{ij}\mid \le 1\). Furthermore, with assumption A6 we have

$$\begin{aligned} B_{nj}^2= & {} \sum _{i=1}^{n}{\mathbb {E}}Z_{ij}^2= (RK_1)^{-2}\sum _{i=1}^{n}{\mathbb {E}}\xi _{ij}^2\le nC^2(\beta ^0)(RK_1)^{-2},\\ L_{nj}= & {} \sum _{i=1}^{n}{\mathbb {E}}(\mid Z_{ij}\mid ^3)/B_{nj}^3\le \sum _{i=1}^{n}{\mathbb {E}}(\mid Z_{ij}\mid ^2)/B_{nj}^3=\frac{1}{B_{nj}}. \end{aligned}$$

Then, \(B_{nj}=O(\sqrt{n})\) and \(L_{nj}=O(1/\sqrt{n})\). By Lemma 6, we have

$$\begin{aligned} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\xi _{ij}\right|> \sqrt{n}C(\beta ^0)b\right\}= & {} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\frac{\xi _{ij}}{RK_1}\right|> \frac{\sqrt{n}C(\beta ^0)}{RK_1}b\right\} \\= & {} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}Z_{ij}\right|> \frac{\sqrt{n}C(\beta ^0)}{RK_1}b\right\} \\\le & {} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}Z_{ij}\right| > B_{nj}b\right\} \\= & {} 2(1+O(1)b^3L_{nj})(1-\Phi (b))\\= & {} \frac{\epsilon }{p}(1+O(\frac{b^3}{\sqrt{n}})) \end{aligned}$$

where the second to last step follows because \(b=O(\sqrt{\log (2p/\epsilon )})\) (see the proof of Theorem 4 for details).As \(n, p\rightarrow \infty \) with \(n\le p =o(e^{n^{1/3}})\), we have

$$\begin{aligned} {\mathbb {P}}(A^c)\le \epsilon (1+o(1)). \end{aligned}$$

This concludes the proof. \(\square \)

Proof of Theorem 2

According to the proof of theorem 1, we just need to verify that our chosen weight function (10) can make the loss function \(\ell _{\omega }\) statisfy Assumptions (A3) and (A4). Invoking the weight function (10) to the weighted score function (3), we have

$$\begin{aligned} \nabla {\ell }_{\omega }(\beta ; X, Y)=\frac{1}{2n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{\beta ^Tx_i}{2}\right) -Y_i\exp \left( -\frac{\beta ^Tx_i}{2}\right) \right\} x_i, \end{aligned}$$

and then, the loss function \(\ell _{\omega }\) is given by

$$\begin{aligned} {\ell }_{\omega }(\beta ; X, Y)=\frac{1}{n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{\beta ^Tx_i}{2}\right) +Y_i\exp \left( -\frac{\beta ^Tx_i}{2}\right) \right\} . \end{aligned}$$

Because \(e^{t}\) and \(e^{-t}\) are both convex three times differentiable functions, the above loss function \(\ell _{\omega }(\beta ; X, Y)\) satisfies Assumption (A3).

Denote \(g(t)=\ell _{\omega }(u+tv; X, Y)\) for \(u, v\in \mathbb {R}^{p}\), we have

$$\begin{aligned} g^{'}(t)= & {} \frac{1}{2n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \\&\quad \left. -Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} v^{T}x_i,\\ g^{''}(t)= & {} \frac{1}{4n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \\&\quad \left. +Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} (v^{T}x_i)^2,\\ g^{'''}(t)= & {} \frac{1}{8n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \\&\quad \left. -Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} (v^{T}x_i)^3. \end{aligned}$$

Then, for all \(u, v\in \mathbb {R}^{p}\) and for all \(t\in \mathbb {R}\), we have

$$\begin{aligned} \mid g^{'''}(t)\mid\le & {} \frac{1}{2}\left| \frac{1}{4n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \right. \\&\quad \left. \left. -Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} (v^{T}x_i)^2\right| (\max _{1\le i\le n}\mid v^Tx_i\mid )\\\le & {} \frac{1}{2}(\max _{1\le i\le n}\mid v^Tx_i\mid ) \left\{ \frac{1}{4n}\sum ^n_{i=1}\left[ \left| (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right| \right. \right. \\&\quad \left. \left. +\left| Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right| \right] (v^{T}x_i)^2\right\} \\= & {} \frac{1}{2}(\max _{1\le i\le n}\mid v^Tx_i\mid ) \mid g^{''}(t)\mid . \end{aligned}$$

Therefore, the loss function \(\ell _{\omega }(\beta ; X, Y)\) also satisfies Assumption (A4). \(\square \)

Proof of Theorem 4

We just need to prove that, when \(s(\sqrt{n})^{-1}\sqrt{\log (2p/\epsilon )}\rightarrow 0\), the \(\lambda \) chosen by (8) or (11) satisfies \(\lambda s \le \frac{c(1-c)\rho }{4c_0R}\). Note that, for any \(t > 0\), we have

$$\begin{aligned} 1-\Phi (t) \le \frac{\phi (t)}{t}, \end{aligned}$$

where \(\phi (\cdot )\) is the density function of standard normal distribution. Let \(t=\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) \), the above inequality then becomes

$$\begin{aligned} \frac{\epsilon }{2p}=1-\Phi (t)\le \frac{\phi (t)}{t}=\frac{\exp (-t^2/2)}{\sqrt{2\pi }t}. \end{aligned}$$

If \(p/\epsilon > 2\), then \(t> \Phi ^{-1}(3/4)>1/\sqrt{2\pi }\). So it is easy to obtain

$$\begin{aligned} \frac{\epsilon }{2p}=1-\Phi (t) < \exp (-t^2/2), \end{aligned}$$

and then \(t < \sqrt{2\log (2p/\epsilon )}\). Hence, \(\Phi ^{-1}(1-\frac{\epsilon }{2p})=O(\sqrt{\log (2p/\epsilon )})\) and

$$\begin{aligned} \lambda = \frac{C(\beta ^0)\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) }{\sqrt{n}c}=O\left( \sqrt{\frac{\log (2p/\epsilon )}{n}}\right) . \end{aligned}$$

If \(s(\sqrt{n})^{-1}\sqrt{\log (2p/\epsilon )}\rightarrow 0\), then \(\lambda s \le \frac{c(1-c)\rho }{4c_0R}\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, Z. Variable selection for sparse logistic regression. Metrika 83, 821–836 (2020). https://doi.org/10.1007/s00184-020-00764-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-020-00764-4

Keywords

Navigation