Abstract
Quantile regression models have become a widely used statistical tool in genetics and in the omics fields because they can provide a rich description of the predictors’ effects on an outcome without imposing stringent parametric assumptions on the outcome-predictors relationship. This work considers the problem of selecting grouped variables in high-dimensional linear quantile regression models. We introduce a group penalized pseudo quantile regression (GPQR) framework with both group-lasso and group non-convex penalties. We approximate the quantile regression check function using a pseudo-quantile check function. Then, using the majorization–minimization principle, we derive a simple and computationally efficient group-wise descent algorithm to solve group penalized quantile regression. We establish the convergence rate property of our algorithm with the group-Lasso penalty and illustrate the GPQR approach performance using simulations in high-dimensional settings. Furthermore, we demonstrate the use of the GPQR method in a gene-based association analysis of data from the Alzheimer’s Disease Neuroimaging Initiative study and in an epigenetic analysis of DNA methylation data.
Similar content being viewed by others
References
Alhamzawi R, Yu K, Benoit DF (2012) Bayesian adaptive lasso quantile regression. Stat Modell 12(3):279–297
Aravkin AY, Kambadur A, Lozano AC, Luss R (2014) Sparse quantile huber regression for efficient and robust estimation. arXiv preprint arXiv:1402.4624
Belloni A, Chernozhukov V et al (2011) l1-penalized quantile regression in high-dimensional sparse models. Ann Stat 39(1):82–130
Belloni A, Chernozhukov V, Wang L (2011) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806
Bickel PJ, Ritov Y, Tsybakov AB et al (2009) Simultaneous analysis of lasso and dantzig selector. Ann Stat 37(4):1705–1732
Bondell HD, Reich BJ, Wang H (2010) Noncrossing quantile regression curve estimation. Biometrika 97(4):825–838
Breheny P (2015) grpreg: regularization paths for regression models with grouped covariates. R Package Version 2:1–8
Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5(1):232
Breheny P, Huang J (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25(2):173–187
Briollais L, Durrieu G (2014) Application of quantile regression to recent genetic and-omic studies. Hum Genet 133(8):951–966
Ciuperca G (2019) Adaptive group lasso selection in quantile models. Stat Pap 60(1):173–197
Durinck S, Spellman PT, Birney E, Huber W (2009) Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat Protoc 4(8):1184
Efron B, Hastie T, Tibshirani R (2007) Discussion: the dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35(6):2358–2364
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fan J, Fan Y, Barut E (2014) Adaptive robust variable selection. Ann Stat 42(1):324
Fan J, Xue L, Zou H (2014) Strong oracle optimality of folded concave penalized estimation. Ann Stat 42(3):819
Fenske N, Kneib T, Hothorn T (2011) Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. J Am Stat Assoc 106(494):494–510
Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736
Gu Y, Zou H et al (2016) High-dimensional generalizations of asymmetric least squares regression and their applications. Ann Stat 44(6):2661–2694
Hashem H, Vinciotti V, Alhamzawi R, Yu K (2016) Quantile regression with group lasso for classification. Adv Data Anal Classif 10(3):375–390
Hertz JM, Schell G, Doerfler W (1999) Factors affecting de novo methylation of foreign DNA in mouse embryonic stem cells. J Biol Chem 274(34):24232–24240
Hofner B, Mayr A, Robinzonov N, Schmid M (2014) Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 29(1–2):3–35
Hohman TJ, Koran MEI, Thornton-Wells TA (2014) Genetic modification of the relationship between phosphorylated tau and neurodegeneration. Alzheimer’s & dementia J Alzheimer’s Assoc 10(6):637–645
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Gr Stat 9(1):60–77
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58(1):30–37
Jennings L, Wong K, Teo K (1996) Optimal control computation to account for eccentric movement. ANZIAM J 38(2):182–193
Ji Y, Lin N, Zhang B (2012) Model selection in binary and tobit quantile regression using the Gibbs sampler. Comput Stat Data Anal 56(4):827–839
Juban R, Ohlsson H, Maasoumy M, Poirier L, Kolter JZ (2016) A multiple quantile regression approach to the wind, solar, and price tracks of gefcom2014. Int J Forecast 32(3):1094–1102
Kato K (2011) Group lasso for high dimensional sparse quantile regression models. arXiv preprint arXiv:1103.1458
Kim S, Swaminathan S, Shen L, Risacher S, Nho K, Foroud T, Shaw L, Trojanowski J, Potkin S, Huentelman M et al (2011) Genome-wide association study of CSF biomarkers a\(\beta\)1-42, t-tau, and p-tau181p in the ADNI cohort. Neurology 76(1):69–79
Koenker R (1984) A note on l-estimates for linear models. Stat Prob Lett 2(6):323–325
Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89
Koenker R, Bassett G Jr (1978) Regression quantiles. Econometrica 46(1):33–50
Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156
Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81(11):1565–1578
Lakhal-Chaieb L, Greenwood CM, Ouhourane M, Zhao K, Abdous B, Oualkacha K (2017) A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type. Stat Appl Genet Mol Biol 16(5–6):333–347
Lange K, Papp JC, Sinsheimer JS, Sobel EM (2014) Next-generation statistical genetics: modeling, penalization, and optimization in high-dimensional data. Annu Rev Stat Appl 1(1):279–300
Li Y, Zhu J (2008) L 1-norm quantile regression. J Comput Gr Stat 17(1):163–185
Li J, Zhang Q, Chen F, Meng X, Liu W, Chen D, Yan J, Kim S, Wang L, Feng W et al (2017) Genome-wide association and interaction studies of CSF t-tau/a\(\beta\)42 ratio in ADNI cohort. Neurobiol Aging 57:247-e1
Liu Y, Wu Y (2009) Stepwise multiple quantile regression estimation using non-crossing constraints. Stat Interface 2(3):299–310
Mayr A, Binder H, Gefeller O, Schmid M (2014) The evolution of boosting algorithms-from machine learning to statistical modelling. arXiv preprint arXiv:1403.1452
Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J R Stat Soc Ser B (Stat Methodol) 70(1):53–71
Mkhadri A, Ouhourane M (2013) An extended variable inclusion and shrinkage algorithm for correlated variables. Comput Stat Data Anal 57(1):631–644
Mkhadri A, Ouhourane M, Oualkacha K (2017) A coordinate descent algorithm for computing penalized smooth quantile regression. Stat Comput 27(4):865–883
Ogutu JO, Piepho H-P (2014) Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. BMC Proc 8(Suppl 5):S7
Oh H-S, Lee TC, Nychka DW (2011) Fast nonparametric quantile regression with arbitrary smoothing methods. J Comput Gr Stat 20(2):510–526
Peng B, Wang L (2015) An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. J Comput Gr Stat 24(3):676–694
Roberts S, Nowak G (2014) Stabilizing the lasso against cross-validation variability. Comput Stat Data Anal 70:198–211
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Gr Stat 22(2):231–245
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58(1):267–288
Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J R Stat Soc Ser B (Stat Methodol) 74(2):245–266
Turgeon M, Oualkacha K, Ciampi A, Miftah H, Dehghan G, Zanke BW, Benedet AL, Rosa-Neto P, Greenwood CM, Labbe A; Alzheimer’s Disease Neuroimaging Initiative (2018) Principal component of explained variance: an efficient and optimal data dimension reduction framework for association studies. Stat Methods Med Res 27(5):1331–1350. https://doi.org/10.1177/0962280216660128
Waldmann E, Kneib T, Yue YR, Lang S, Flexeder C (2013) Bayesian semiparametric additive quantile regression. Stat Modell 13(3):223–252
Wang L (2013) The l1 penalized LAD estimator for high dimensional linear regression. J Multivar Anal 120:135–151
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222
Wang H, Lengerich BJ, Aragam B, Xing EP (2019) Precision lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics 35(7):1181–1187
Wei F, Zhu H (2012) Group coordinate descent algorithms for nonconvex penalized regression. Comput Stat Data Anal 56(2):316–326
Wu TT, Lange K et al (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Xu QF, Ding XH, Jiang CX, Yu KM, Shi L (2020) An elastic-net penalized expectile regression with applications. J Appl Stat. https://doi.org/10.1080/02664763.2020.1787355
Yang Y, Zou H (2013) An efficient algorithm for computing the HHSVM and its generalizations. J Comput Gr Stat 22(2):396–415
Yang Y, Zou H (2015) A fast unified algorithm for solving group-lasso penalize learning problems. Stat Comput 25(6):1129–1141
Yi C, Huang J (2017) Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression. J Comput Gr Stat 26(3):547–557
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Zhang C-H et al (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zhao G, Teo KL, Chan K (2005) Estimation of conditional quantiles by a new smoothing approximation of asymmetric loss functions. Stat Comput 15(1):5–11
Zhou H, Alexander DH, Sehl ME, Sinsheimer JS, Sobel EM, Lange K (2011) Penalized regression for genome-wide association screening of sequence data. Pac Symp Biocomput 2011:106–117. https://doi.org/10.1142/9789814335058_0012. PMID: 21121038; PMCID: PMC5049883
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509
Acknowledgements
This work is supported by the Natural Sciences and Engineering Research Council of Canada through an individual discovery research grant to Karim Oualkacha and by the Fonds de recherche du Québec-Santé through individual Grant #31110 to Karim Oualkacha. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense Award Number W81XWH-12-2-0012). The ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from: the AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research and Development, LLC.; Johnson and Johnson Pharmaceutical Research and Development LLC.; Lumosity; Lundbeck; Merck and Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institute of Health Research provides funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
For the Alzheimer’s Disease Neuroimaging Initiative.
Supplementary Information
This document includes proofs of Propositions 1, 2, and 3, and Theorem 1 of the main manuscript. It also contains the theoretical and numerical developments of the KKT conditions, three additional illustrative figures, and one table from the results of the analysis of the ADNI data using our GPQR approach. (.pdf file). Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ouhourane, M., Yang, Y., Benedet, A.L. et al. Group penalized quantile regression. Stat Methods Appl 31, 495–529 (2022). https://doi.org/10.1007/s10260-021-00580-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-021-00580-8