Abstract
For high-dimensional models with a focus on classification performance, the ℓ1-penalized logistic regression is becoming important and popular. However, the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data. We propose two types of weighted Lasso estimates, depending upon covariates determined by the McDiarmid inequality. Given sample size n and a dimension of covariates p, the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the ℓ1-estimation error and the squared prediction error of the unknown parameters. We compare the performance of our method with that of former weighted estimates on simulated data, then apply it to do real data analysis.
Similar content being viewed by others
References
Algamal Z Y, Lee M H. A new adaptive Ll-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives. SAR and QSAR in Environmental Research, 2017, 28(1): 75–90
Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37(4): 1705–1732
Buhlmann P, Van De Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Science & Business Media, 2011
Boucheron S, Lugosi G, Massart P. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013
Bunea F. Honest variable selection in linear and logistic regression models via l(1) and l(1) + l(2) penalization. Electronic Journal of Statistics, 2008, 2: 1153–1194
Cox D R. The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society: Series B (Methodological), 1958, 20(2): 215–232
Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97(457): 77–87
Efron B, Hastie T. Computer Age Statistical Inference. Cambridge University Press, 2016
Fan Y, Zhang H, Yan T. Asymptotic theory for differentially private generalized β-models with parameters increasing. Statistics and Its Interface, 2020, 13(3): 385–398
Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537
Guo P, Zeng F, Hu X, et al. Improved variable selection algorithm using a LASSO-type penalty, with an application to assessing hepatitis B infection relevant factors in community residents. PloS One, 2015, 10(7)
Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity: the Lasso and Generalizations. CRC Press, 2015
Li W, Lederer J. Tuning parameter calibration for l(1)-regularized logistic regression. Journal of Statistical Planning and Inference, 2019, 202: 80–98
Liu C, San Wong H. Structured penalized logistic regression for gene selection in gene expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 16(1): 312–321
Kwemou M. Non-asymptotic oracle inequalities for the Lasso and group Lasso in high dimensional logistic model. ESAIM: Probability and Statistics, 2016, 20: 309–331
Ma R, Cai T, Li H. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. Journal of the American Statistical Association, 2020: 1–15
Park H, Konishi S. Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. Journal of Statistical Computation and Simulation, 2016, 86(7): 1450–1461
Rigollet P, Hütter J C. High Dimensional Statistics. MIT Open CourseWare. 2019. http://www-math.mit.edu/rigollet/PDFs/RigNotes17.pdf
Sur P, Chen Y, Candes E J. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probability Theory and Related Fields, 2019, 175(1/2): 487–558
Tutz G. Regression for Categorical Data. Cambridge University Press, 2011
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267–288
van de Geer, S. A. High-dimensional generalized linear models and the lasso. The Annals of Statistics, 2008, 36(2): 614–645
Yang X, Zhang H, Wei H, et al. Sparse density estimation with measurement errors. arXiv: 1911.06215, 2019
Yin Z. Variable selection for sparse logistic regression. Metrika, 2020, 83(7): 821–836
Zou H. The adaptive lasso and its oracle properties. Journal of the American statistical association, 2006, 101(476): 1418–1429
Zhang H, Jia J. Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signals detection. Statistica Sinica, 2021
Zhang H. A note on//MLE in logistic regression with a diverging dimension. arXiv: 1801.08898, 2018
Luo J, Qin H, Wang Z. Asymptotic distribution in directed finite weighted random graphs with an increasing Bi-degree sequence. Acta Math Sci, 2020, 40B(2): 355–368
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (61877023) and the Fundamental Research Funds for the Central Universities (CCNU19TD009).
Rights and permissions
About this article
Cite this article
Huang, H., Gao, Y., Zhang, H. et al. Weighted Lasso estimates for sparse logistic regression: non-asymptotic properties with measurement errors. Acta Math Sci 41, 207–230 (2021). https://doi.org/10.1007/s10473-021-0112-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10473-021-0112-6