Variable selection for sparse logistic regression

Yin, Zanhua

doi:10.1007/s00184-020-00764-4

Variable selection for sparse logistic regression

Published: 06 February 2020

Volume 83, pages 821–836, (2020)
Cite this article

Metrika Aims and scope Submit manuscript

Zanhua Yin ORCID: orcid.org/0000-0001-6777-157X¹

1071 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the variable selection problem in a sparse logistical regression model. Inspired by the square-root Lasso, we develop a weighted score Lasso for logistical regression. The new method yields the estimation ${\ell }_1$ error bound under similar assumptions as introduced in Bach et al. (Electron J Stat 4:384–414, 2010). Compared to standard Lasso, the weighted score Lasso provides a direct choice for the tuning parameter. Both theoretical and simulation results confirm the satisfactory performance of the proposed method. We illustrate our methodology with a real microarray data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

Article 24 December 2020

Huamei Huang, Yujing Gao, … Bo Li

Covariate-Correlated Lasso for Feature Selection

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Article 06 June 2016

Aijun Yang, Xuejun Jiang, … Jinguan Lin

References

Bach F et al (2010) Self-concordant analysis for logistic regression. Electron J Stat 4:384–414
Article MathSciNet Google Scholar
Belloni A, Chernozhukov V, Wang L (2011) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806
Article MathSciNet Google Scholar
Bickel PJ, Ritov Y, Tsybakov AB (2009) Simultaneous analysis of lasso and dantzig selector. Ann Stat 37:1705–1732
Article MathSciNet Google Scholar
Blazere M, Loubes J-M, Gamboa F (2014) Oracle inequalities for a group lasso procedure applied to generalized linear models in high dimension. IEEE Trans Inf Theory 60(4):2303–2318
Article MathSciNet Google Scholar
Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Book Google Scholar
Bunea F et al (2008) Honest variable selection in linear and logistic regression models via $\ell _1$ and $\ell _1+ \ell _2$ penalization. Electron J Stat 2:1153–1194
Article MathSciNet Google Scholar
Candes E, Tao T (2007) The Dantzig selector: statistical estimation when $p$ is much larger than $n$. Ann Stat 35:2313–2351
Article MathSciNet Google Scholar
Dettling M (2004) Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593
Article Google Scholar
Dobson AJ, Barnett A (2008) An introduction to generalized linear models. CRC Press, Boca Raton
MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Huang Y, Wang C (2001) Consistent functional methods for logistic regression with errors in covariates. J Am Stat Assoc 96(456):1469–1482
Article MathSciNet Google Scholar
Huang J, Ma S, Zhang C-H (2008) The iterated $l$asso for high-dimensional logistic regression. The University of Iowa, Department of Statistics and Actuarial Sciences
Kwemou M (2016) Non-asymptotic oracle inequalities for the lasso and group lasso in high dimensional logistic model. ESAIM Probab Stat 20:309–331
Article MathSciNet Google Scholar
Lee SI, Lee H, Abbeel P, Ng AY (2014) Efficient $l_1$ regularized logistic regression. In: National conference on artificial intelligence
Loh P-L, Wainwright MJ (2013) Regularized $m$-estimators with nonconvexity: statistical and algorithmic theory for local optima. In Advances in neural information processing systems, pp 476–484
Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J R Stat Soc Ser B (Stat Methodol) 70(1):53–71
Article MathSciNet Google Scholar
Negahban S, Ravikumar P, Wainwright MJ, Yu B (2011) A unified framework for high-dimensional analysis of $m$-estimators with decomposable regularizers. In: Advances in neural information processing systems (NIPS)
Negahban SN, Ravikumar P, Wainwright MJ, Yu B et al (2012) A unified framework for high-dimensional analysis of $ m $-estimators with decomposable regularizers. Stat Sci 27(4):538–557
Article MathSciNet Google Scholar
Sakhanenko AI (1991) Berry-Esseen type estimates for large deviation probabilities. Sib Math J 32(4):647–656
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
MathSciNet MATH Google Scholar
Van de Geer S (2007) The deterministic lasso. Seminar für Statistik, Eidgenössische Technische Hochschule (ETH) Zürich
Van de Geer SA (2008) High-dimensional generalized linear models and the lasso. Ann Stat 36:614–645
Article MathSciNet Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank anonymous referees and an associate editor for their helpful and constructive comments to improve this manuscript a lot. This work was supported by GJJ160927.

Author information

Authors and Affiliations

Gannan Normal University, Ganzhou, People’s Republic of China
Zanhua Yin

Authors

Zanhua Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zanhua Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 1

Let $\delta ={\hat{\beta }}-\beta ^0$. Recall that $I=\{j: \beta ^0_j \ne 0\}$. For convenience, we write ${\ell }_{\omega }\left( \beta \right) $ to denote ${\ell }_{\omega }\left( \beta ; X, Y\right) $, and similarly, $\nabla {\ell }_{\omega }\left( \beta \right) =\nabla {\ell }_{\omega }\left( \beta ; X, Y\right) $ and $\nabla ^2{\ell }_{\omega }\left( \beta \right) =\nabla ^2{\ell }_{\omega }\left( \beta ; X, Y\right) $ . By the definition of the estimator ${\hat{\beta }}$, we have

$$\begin{aligned} {\ell }_{\omega }\left( {\hat{\beta }}\right) -{\ell }_{\omega }\left( \beta ^0\right) \le \lambda \left( \Vert \beta ^0\Vert _1-\Vert {\hat{\beta }}\Vert _1\right) \le \lambda \Vert \delta _I\Vert _1-\lambda \Vert \delta _{I^c}\Vert _1. \end{aligned}$$

(14)

Since ${\ell }_{\omega }\left( \beta \right) $ is a convex function, we obtain

$$\begin{aligned} {\ell }_{\omega }\left( {\hat{\beta }}\right) -{\ell }_{\omega }\left( \beta ^0\right) \ge \delta ^T\nabla {\ell }_{\omega }\left( \beta ^0\right) \ge -\Vert \nabla {\ell }_{\omega }\left( \beta ^0\right) \Vert _{\infty }\Vert \delta \Vert _1. \end{aligned}$$

(15)

Define the event

$$\begin{aligned} A=\left\{ \Vert \nabla {\ell }_{\omega }\left( \beta ^0\right) \Vert _{\infty }\le c\lambda \right\} ,~ \text {for}~ 0<c<1. \end{aligned}$$

(16)

Combining (14) and (15), on the event A we have

$$\begin{aligned} \Vert \delta _{I^c}\Vert _1\le \frac{1+c}{1-c}\Vert \delta _{I}\Vert _1. \end{aligned}$$

Therefore, on the event A we have $\delta \in \triangle _{\alpha }$, for $\alpha =\frac{1+c}{1-c}$.

Define the function $g(t)={\ell }_{\omega }(\beta ^0+t\delta )$. Following assumptions (A1) and (A4), we have

$$\begin{aligned}&|g^{'''}(t)|\le c_0\max _{1\le i\le n}\mid x^T_i\delta \mid g^{''}(t)\\&\le c_0\left( \sup _{1\le i\le n, 1\le j\le p}\mid x_{ij}\mid \right) \Vert \delta \Vert _1g^{''}(t) \le c_0R\Vert \delta \Vert _1g^{''}(t), \end{aligned}$$

and invoke the condition $\delta \in \triangle _{\alpha }$ to obtain

$$\begin{aligned} |g^{'''}(t)|\le c_0R(\alpha +1)\Vert \delta _I\Vert _1g^{''}(t)\le c_0R(\alpha +1)\sqrt{s}\Vert \delta _I\Vert _2g^{''}(t). \end{aligned}$$

(17)

Denote ${\bar{R}}=c_0R(\alpha +1)\sqrt{s}$, then, by Proposition 1 of Bach et al. (2010), we have

$$\begin{aligned} {\ell }_{\omega }({\hat{\beta }})-{\ell }_{\omega }(\beta ^0)\ge & {} \delta ^T\nabla {\ell }_{\omega }(\beta ^0)+\frac{\delta ^T\nabla ^2{\ell }_{\omega }(\beta ^0)\delta }{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}\left( e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1\right) \nonumber \\\ge & {} -c\lambda \Vert \delta \Vert _1+\frac{\delta ^T\nabla ^2{\ell }_{\omega }(\beta ^0)\delta }{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}\left( e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1\right) . \end{aligned}$$

(18)

Combining (14) and (18), on the event A we obtain

$$\begin{aligned} \frac{\delta ^T\nabla ^2{\ell }_{\omega }(\beta ^0)\delta }{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}(e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1)\le c\lambda \Vert \delta \Vert _1+\lambda \Vert \delta _{I}\Vert _1-\lambda \Vert \delta _{I^c}\Vert _1. \end{aligned}$$

Adding $(1-c)\lambda \Vert \delta \Vert _1$ to both sides of above inequality and invoking restricted eigenvalue condition (A5), we also have

$$\begin{aligned} \frac{\rho }{{\bar{R}}^2}\left( e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1\right) +(1-c)\lambda \Vert \delta \Vert _1 \le 2\lambda \Vert \delta _I\Vert _1. \end{aligned}$$

(19)

Using the fact $\Vert \delta _I\Vert _1\le \sqrt{s}\Vert \delta _I\Vert _2$, we have

$$\begin{aligned} e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1 \le \frac{2\lambda {\bar{R}}^2\sqrt{s}}{\rho }\Vert \delta _I\Vert _2. \end{aligned}$$

(20)

With a short calculation, we obtain, for all $t\in [0, 1)$, that

$$\begin{aligned} \exp \left( \frac{-2t}{1-t}\right) +(1-t)\frac{2t}{1-t}-1 \ge 0. \end{aligned}$$

Set $t={\bar{R}}\Vert \delta _I\Vert _2/\left( 2+{\bar{R}}\Vert \delta _I\Vert _2\right) $, then we have

$$\begin{aligned} e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1 \ge \frac{{\bar{R}}^2\Vert \delta _I\Vert ^2_2}{2+{\bar{R}}\Vert \delta _I\Vert _2}. \end{aligned}$$

This implies using (20) that

$$\begin{aligned} \frac{\Vert \delta _I\Vert _2}{2+{\bar{R}}\Vert \delta _I\Vert _2} \le \frac{2\lambda \sqrt{s}}{\rho }. \end{aligned}$$

If $\lambda s < \frac{c(1-c)\rho }{4c_0R}$, we have ${\bar{R}}\Vert \delta _I\Vert _2\le \frac{2c}{1-c}$ and consequently

$$\begin{aligned} e^{-{\bar{R}}\Vert \delta _I\Vert _2}+{\bar{R}}\Vert \delta _I\Vert _2-1 \ge \frac{(1-c){\bar{R}}^2}{2}\Vert \delta _I\Vert ^2_2. \end{aligned}$$

(21)

Combining (19) and (21), we obtain

$$\begin{aligned} \frac{(1-c)\rho }{2}\Vert \delta _I\Vert ^2_2+(1-c)\lambda \Vert \delta \Vert _1 \le 2\lambda \Vert \delta _I\Vert _1. \end{aligned}$$

Using the inequality $2uv \le au^2+v^2/a$, for any $a>1$, we further obtain

$$\begin{aligned} \frac{(1-c)\rho }{2}\Vert \delta _I\Vert ^2_2+(1-c)\lambda \Vert \delta \Vert _1 \le a\lambda ^2 s+\frac{1}{a}\Vert \delta _I\Vert ^2_2. \end{aligned}$$

By taking $a=\frac{2}{(1-c)\rho }$, then we obtain, on the event A, that

$$\begin{aligned} \Vert \delta \Vert _1\le \frac{2}{\rho (1-c)^2}\lambda s. \end{aligned}$$

To conclude the proof we determine now $\lambda =(\sqrt{n}c)^{-1}C(\beta ^0)\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) $ such that ${\mathbb {P}}(A^c)\le \epsilon (1+o(1))$.

Denote $\xi _{ij}=\omega (x^T_i\beta ^0)\left[ F(x_i^T\beta ^0)-Y_i\right] x_{ij}$, then ${\mathbb {E}}(\xi _{ij})=0$ and ${\mathbb {E}}(\xi _{ij}^2)=\omega ^2(x^T_i\beta ^0)F(x_i^T\beta ^0)(1-F(x_i^T\beta ^0)x_{ij}^2$. Denote $b=\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) $, then, by (16), we have

$$\begin{aligned} {\mathbb {P}}(A^c)= & {} {\mathbb {P}}\left\{ \left\| \nabla {\ell }_{\omega }(\beta ^0)\right\| _{\infty }>c\lambda \right\} \\= & {} {\mathbb {P}}\left\{ \underset{1\le j\le p}{\max }\left| \frac{1}{n}\sum _{i=1}^{n}\left\{ \omega (x^T_i\beta ^0)\left[ F(x_i^T\beta ^0)-Y_i\right] x_{ij}\right\} \right|> c\lambda \right\} \\\le & {} p \underset{1\le j\le p}{\max }{\mathbb {P}}\left\{ \left| \frac{1}{n}\sum _{i=1}^{n}\left\{ \omega (x^T_i\beta ^0)\left[ F(x_i^T\beta ^0)-Y_i\right] x_{ij}\right\} \right|> c\lambda \right\} \\= & {} p \underset{1\le j\le p}{\max }{\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\xi _{ij}\right| > \sqrt{n}C(\beta ^0)b\right\} . \end{aligned}$$

We now use Lemma 6 to estimate ${\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\xi _{ij}\right| > \sqrt{n}C(\beta ^0)b\right\} $.

Lemma 6

(Sakhanenko type moderate deviation theorem (Sakhanenko 1991)) Let $Z_1,\cdots , Z_n$ be independent random variables with ${\mathbb {E}}(Z_i)=0$ and $\mid Z_i\mid <1$ for all $1\le i \le n$. Denote $B_n^2=\sum _{i=1}^{n}{\mathbb {E}}(Z_i^2)$ and $L_n=\sum _{i=1}^{n}{\mathbb {E}}( \mid Z_i\mid ^3)/B_n^3$. Then there exists a positive constant A such that for all $x\in [1, \frac{1}{A}\min \{B_n, L_n^{-1/3}\}]$

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^{n}Z_i> xB_n\right) =(1+O(1)x^3L_n)(1-\Phi (x)). \end{aligned}$$

Since $Y_i\in \{0, 1\}$ and ${\mathbb {P}}(Y_i=1)=F(x_i^{T}\beta ^0)\le 1$, then, with assumption A1 we have

$$\begin{aligned} \mid \xi _{ij}\mid \le \left( \underset{1\le i\le n, 1\le j\le p}{\sup }\mid x_{ij}\mid \right) (\mid \omega (x^T_i\beta ^0)\mid )(\mid F(x_i^T\beta ^0)-Y_i\mid )\le RK_1, \end{aligned}$$

with a positive constant $K_1=\underset{1\le i\le n}{\sup }\omega (x^T_i\beta ^0)$.

Let $Z_{ij}=\xi _{ij}/(RK_1)$, then we have ${\mathbb {E}}Z_{ij}=0$, $\mid Z_{ij}\mid \le 1$. Furthermore, with assumption A6 we have

$$\begin{aligned} B_{nj}^2= & {} \sum _{i=1}^{n}{\mathbb {E}}Z_{ij}^2= (RK_1)^{-2}\sum _{i=1}^{n}{\mathbb {E}}\xi _{ij}^2\le nC^2(\beta ^0)(RK_1)^{-2},\\ L_{nj}= & {} \sum _{i=1}^{n}{\mathbb {E}}(\mid Z_{ij}\mid ^3)/B_{nj}^3\le \sum _{i=1}^{n}{\mathbb {E}}(\mid Z_{ij}\mid ^2)/B_{nj}^3=\frac{1}{B_{nj}}. \end{aligned}$$

Then, $B_{nj}=O(\sqrt{n})$ and $L_{nj}=O(1/\sqrt{n})$. By Lemma 6, we have

$$\begin{aligned} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\xi _{ij}\right|> \sqrt{n}C(\beta ^0)b\right\}= & {} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}\frac{\xi _{ij}}{RK_1}\right|> \frac{\sqrt{n}C(\beta ^0)}{RK_1}b\right\} \\= & {} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}Z_{ij}\right|> \frac{\sqrt{n}C(\beta ^0)}{RK_1}b\right\} \\\le & {} {\mathbb {P}}\left\{ \left| \sum _{i=1}^{n}Z_{ij}\right| > B_{nj}b\right\} \\= & {} 2(1+O(1)b^3L_{nj})(1-\Phi (b))\\= & {} \frac{\epsilon }{p}(1+O(\frac{b^3}{\sqrt{n}})) \end{aligned}$$

where the second to last step follows because $b=O(\sqrt{\log (2p/\epsilon )})$ (see the proof of Theorem 4 for details).As $n, p\rightarrow \infty $ with $n\le p =o(e^{n^{1/3}})$, we have

$$\begin{aligned} {\mathbb {P}}(A^c)\le \epsilon (1+o(1)). \end{aligned}$$

This concludes the proof. $\square $

Proof of Theorem 2

According to the proof of theorem 1, we just need to verify that our chosen weight function (10) can make the loss function $\ell _{\omega }$ statisfy Assumptions (A3) and (A4). Invoking the weight function (10) to the weighted score function (3), we have

$$\begin{aligned} \nabla {\ell }_{\omega }(\beta ; X, Y)=\frac{1}{2n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{\beta ^Tx_i}{2}\right) -Y_i\exp \left( -\frac{\beta ^Tx_i}{2}\right) \right\} x_i, \end{aligned}$$

and then, the loss function $\ell _{\omega }$ is given by

$$\begin{aligned} {\ell }_{\omega }(\beta ; X, Y)=\frac{1}{n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{\beta ^Tx_i}{2}\right) +Y_i\exp \left( -\frac{\beta ^Tx_i}{2}\right) \right\} . \end{aligned}$$

Because $e^{t}$ and $e^{-t}$ are both convex three times differentiable functions, the above loss function $\ell _{\omega }(\beta ; X, Y)$ satisfies Assumption (A3).

Denote $g(t)=\ell _{\omega }(u+tv; X, Y)$ for $u, v\in \mathbb {R}^{p}$, we have

$$\begin{aligned} g^{'}(t)= & {} \frac{1}{2n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \\&\quad \left. -Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} v^{T}x_i,\\ g^{''}(t)= & {} \frac{1}{4n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \\&\quad \left. +Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} (v^{T}x_i)^2,\\ g^{'''}(t)= & {} \frac{1}{8n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \\&\quad \left. -Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} (v^{T}x_i)^3. \end{aligned}$$

Then, for all $u, v\in \mathbb {R}^{p}$ and for all $t\in \mathbb {R}$, we have

$$\begin{aligned} \mid g^{'''}(t)\mid\le & {} \frac{1}{2}\left| \frac{1}{4n}\sum ^n_{i=1}\left\{ (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right. \right. \\&\quad \left. \left. -Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right\} (v^{T}x_i)^2\right| (\max _{1\le i\le n}\mid v^Tx_i\mid )\\\le & {} \frac{1}{2}(\max _{1\le i\le n}\mid v^Tx_i\mid ) \left\{ \frac{1}{4n}\sum ^n_{i=1}\left[ \left| (1-Y_i)\exp \left( \frac{u^Tx_i+tv^Tx_i}{2}\right) \right| \right. \right. \\&\quad \left. \left. +\left| Y_i\exp \left( -\frac{u^Tx_i+tv^Tx_i}{2}\right) \right| \right] (v^{T}x_i)^2\right\} \\= & {} \frac{1}{2}(\max _{1\le i\le n}\mid v^Tx_i\mid ) \mid g^{''}(t)\mid . \end{aligned}$$

Therefore, the loss function $\ell _{\omega }(\beta ; X, Y)$ also satisfies Assumption (A4). $\square $

Proof of Theorem 4

We just need to prove that, when $s(\sqrt{n})^{-1}\sqrt{\log (2p/\epsilon )}\rightarrow 0$, the $\lambda $ chosen by (8) or (11) satisfies $\lambda s \le \frac{c(1-c)\rho }{4c_0R}$. Note that, for any $t > 0$, we have

$$\begin{aligned} 1-\Phi (t) \le \frac{\phi (t)}{t}, \end{aligned}$$

where $\phi (\cdot )$ is the density function of standard normal distribution. Let $t=\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) $, the above inequality then becomes

$$\begin{aligned} \frac{\epsilon }{2p}=1-\Phi (t)\le \frac{\phi (t)}{t}=\frac{\exp (-t^2/2)}{\sqrt{2\pi }t}. \end{aligned}$$

If $p/\epsilon > 2$, then $t> \Phi ^{-1}(3/4)>1/\sqrt{2\pi }$. So it is easy to obtain

$$\begin{aligned} \frac{\epsilon }{2p}=1-\Phi (t) < \exp (-t^2/2), \end{aligned}$$

and then $t < \sqrt{2\log (2p/\epsilon )}$. Hence, $\Phi ^{-1}(1-\frac{\epsilon }{2p})=O(\sqrt{\log (2p/\epsilon )})$ and

$$\begin{aligned} \lambda = \frac{C(\beta ^0)\Phi ^{-1}\left( 1-\frac{\epsilon }{2p}\right) }{\sqrt{n}c}=O\left( \sqrt{\frac{\log (2p/\epsilon )}{n}}\right) . \end{aligned}$$

If $s(\sqrt{n})^{-1}\sqrt{\log (2p/\epsilon )}\rightarrow 0$, then $\lambda s \le \frac{c(1-c)\rho }{4c_0R}$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, Z. Variable selection for sparse logistic regression. Metrika 83, 821–836 (2020). https://doi.org/10.1007/s00184-020-00764-4

Download citation

Received: 10 December 2018
Published: 06 February 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00184-020-00764-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Variable selection for sparse logistic regression

Abstract

Access this article

Similar content being viewed by others

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

Covariate-Correlated Lasso for Feature Selection

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 1

Lemma 6

Proof of Theorem 2

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable selection for sparse logistic regression

Abstract

Access this article

Similar content being viewed by others

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

Covariate-Correlated Lasso for Feature Selection

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 1

Lemma 6

Proof of Theorem 2

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation