Skip to main content

Advertisement

Log in

Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors

  • Published:
Acta Mathematica Scientia Aims and scope Submit manuscript

Abstract

For high-dimensional models with a focus on classification performance, the 1-penalized logistic regression is becoming important and popular. However, the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data. We propose two types of weighted Lasso estimates, depending upon covariates determined by the McDiarmid inequality. Given sample size n and a dimension of covariates p, the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the 1-estimation error and the squared prediction error of the unknown parameters. We compare the performance of our method with that of former weighted estimates on simulated data, then apply it to do real data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Algamal Z Y, Lee M H. A new adaptive Ll-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives. SAR and QSAR in Environmental Research, 2017, 28(1): 75–90

    Article  Google Scholar 

  2. Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37(4): 1705–1732

    Article  MathSciNet  Google Scholar 

  3. Buhlmann P, Van De Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Science & Business Media, 2011

  4. Boucheron S, Lugosi G, Massart P. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

  5. Bunea F. Honest variable selection in linear and logistic regression models via l(1) and l(1) + l(2) penalization. Electronic Journal of Statistics, 2008, 2: 1153–1194

    Article  MathSciNet  Google Scholar 

  6. Cox D R. The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society: Series B (Methodological), 1958, 20(2): 215–232

    MathSciNet  MATH  Google Scholar 

  7. Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97(457): 77–87

    Article  MathSciNet  Google Scholar 

  8. Efron B, Hastie T. Computer Age Statistical Inference. Cambridge University Press, 2016

  9. Fan Y, Zhang H, Yan T. Asymptotic theory for differentially private generalized β-models with parameters increasing. Statistics and Its Interface, 2020, 13(3): 385–398

    Article  MathSciNet  Google Scholar 

  10. Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537

    Article  Google Scholar 

  11. Guo P, Zeng F, Hu X, et al. Improved variable selection algorithm using a LASSO-type penalty, with an application to assessing hepatitis B infection relevant factors in community residents. PloS One, 2015, 10(7)

  12. Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity: the Lasso and Generalizations. CRC Press, 2015

  13. Li W, Lederer J. Tuning parameter calibration for l(1)-regularized logistic regression. Journal of Statistical Planning and Inference, 2019, 202: 80–98

    Article  MathSciNet  Google Scholar 

  14. Liu C, San Wong H. Structured penalized logistic regression for gene selection in gene expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 16(1): 312–321

    Article  Google Scholar 

  15. Kwemou M. Non-asymptotic oracle inequalities for the Lasso and group Lasso in high dimensional logistic model. ESAIM: Probability and Statistics, 2016, 20: 309–331

    Article  MathSciNet  Google Scholar 

  16. Ma R, Cai T, Li H. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. Journal of the American Statistical Association, 2020: 1–15

  17. Park H, Konishi S. Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. Journal of Statistical Computation and Simulation, 2016, 86(7): 1450–1461

    Article  MathSciNet  Google Scholar 

  18. Rigollet P, Hütter J C. High Dimensional Statistics. MIT Open CourseWare. 2019. http://www-math.mit.edu/rigollet/PDFs/RigNotes17.pdf

  19. Sur P, Chen Y, Candes E J. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probability Theory and Related Fields, 2019, 175(1/2): 487–558

    Article  MathSciNet  Google Scholar 

  20. Tutz G. Regression for Categorical Data. Cambridge University Press, 2011

  21. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267–288

    MathSciNet  MATH  Google Scholar 

  22. van de Geer, S. A. High-dimensional generalized linear models and the lasso. The Annals of Statistics, 2008, 36(2): 614–645

    Article  MathSciNet  Google Scholar 

  23. Yang X, Zhang H, Wei H, et al. Sparse density estimation with measurement errors. arXiv: 1911.06215, 2019

  24. Yin Z. Variable selection for sparse logistic regression. Metrika, 2020, 83(7): 821–836

    Article  MathSciNet  Google Scholar 

  25. Zou H. The adaptive lasso and its oracle properties. Journal of the American statistical association, 2006, 101(476): 1418–1429

    Article  MathSciNet  Google Scholar 

  26. Zhang H, Jia J. Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signals detection. Statistica Sinica, 2021

  27. Zhang H. A note on//MLE in logistic regression with a diverging dimension. arXiv: 1801.08898, 2018

  28. Luo J, Qin H, Wang Z. Asymptotic distribution in directed finite weighted random graphs with an increasing Bi-degree sequence. Acta Math Sci, 2020, 40B(2): 355–368

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Li.

Additional information

Supported by the National Natural Science Foundation of China (61877023) and the Fundamental Research Funds for the Central Universities (CCNU19TD009).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, H., Gao, Y., Zhang, H. et al. Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors. Acta Math Sci 41, 207–230 (2021). https://doi.org/10.1007/s10473-021-0112-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10473-021-0112-6

Key words

2010 MR Subject Classification

Navigation