Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter October 7, 2019

Stability selection for lasso, ridge and elastic net implemented with AFT models

  • Md Hasinur Rahaman Khan EMAIL logo , Anamika Bhadra and Tamanna Howlader

Abstract

The instability in the selection of models is a major concern with data sets containing a large number of covariates. We focus on stability selection which is used as a technique to improve variable selection performance for a range of selection methods, based on aggregating the results of applying a selection procedure to sub-samples of the data where the observations are subject to right censoring. The accelerated failure time (AFT) models have proved useful in many contexts including the heavy censoring (as for example in cancer survival) and the high dimensionality (as for example in micro-array data). We implement the stability selection approach using three variable selection techniques—Lasso, ridge regression, and elastic net applied to censored data using AFT models. We compare the performances of these regularized techniques with and without stability selection approaches with simulation studies and two real data examples–a breast cancer data and a diffuse large B-cell lymphoma data. The results suggest that stability selection gives always stable scenario about the selection of variables and that as the dimension of data increases the performance of methods with stability selection also improves compared to methods without stability selection irrespective of the collinearity between the covariates.

Acknowledgement

We thank Institute of Statistical Research and Training (ISRT), University of Dhaka, Bangladesh for giving us the platform to conduct this research study.

  1. Conflict of interest statement: The authors have declared no conflict of interest.

References

Ambroise, C. and G. J. McLachlan (2002): “Selection bias in gene extraction on the basis of microarray gene-expression data,” PNAS, 99, 6562–6566.10.1073/pnas.102102699Search in Google Scholar PubMed PubMed Central

Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n,” Ann. Stat., 35, 2313–2351.10.1214/009053606000001523Search in Google Scholar

Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Stat., 32, 407–499.10.1214/009053604000000067Search in Google Scholar

Fan, J. and R. Li (2002): “Variable selection for Cox’s proportional hazards model and frailty model,” Ann. Stat., 30, 74–99.10.1214/aos/1015362185Search in Google Scholar

Faraggi, D. and R. Simon (1998): “Bayesian variable selection method for censored survival data,” Biometrics, 54, 1475–85.10.2307/2533672Search in Google Scholar PubMed

Gatter, K. and F. Pezzella (2010): “Diffuse large B-cell lymphoma,” Diagn. Histopathol., 16, 69–81.10.1016/j.mpdhp.2009.12.002Search in Google Scholar

G’Sell, M. G., T. Hastie and R. Tibshirani (2013): “False variable selection rates in regression,” arXiv, arXiv:1302.2303.Search in Google Scholar

Gui, J. and H. Li (2005a): “Penalized Cox regression analysis in the highdimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21, 3001–3008.10.1093/bioinformatics/bti422Search in Google Scholar PubMed

Gui, J. and H. Li (2005b): “Threshold gradient descent method for censored data regression, with applications in pharmacogenomics,” Pac. Symp. Biocomput., 10, 272–283.10.1142/9789812702456_0026Search in Google Scholar PubMed

Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: applications to nonorthogonal problems,” Technometrics, 12, 69–82.10.1080/00401706.1970.10488635Search in Google Scholar

Huang, J. and S. Ma (2010a): “Variable selection in the accelerated failure time model via the bridge method,” Lifetime Data Anal., 16, 176–195.10.1007/s10985-009-9144-2Search in Google Scholar PubMed PubMed Central

Huang, J. and S. Ma (2010b): “Variable selection in the accelerated failure time model via the bridge method,” Lifetime Data Anal., 16, 176–195.10.1007/s10985-009-9144-2Search in Google Scholar PubMed PubMed Central

Huang, J., S. Ma and H. Xie (2006): “Regularized estimation in the accelerated failure time model with high-dimensional covariates,” Biometrics, 62, 813–820.10.1111/j.1541-0420.2006.00562.xSearch in Google Scholar PubMed

Ibrahim, J. G., M.-H. Chen and S. N. Maceachern (1999): “Bayesian variable selection for proportional hazards models,” Can. J. Stat., 27, 701–717.10.2307/3316126Search in Google Scholar

Ioannidis, J. P. A. (2005): “Selection bias in gene extraction on the basis of microarray gene-expression data,” PLoS Med., 2, e124.10.1371/journal.pmed.0020124Search in Google Scholar PubMed PubMed Central

James, G. M. and P. Radchenko (2009): “A generalized dantzig selector with shrinkage tuning,” Biometrika, 96, 323–337.10.1093/biomet/asp013Search in Google Scholar

Kalbfleisch, J. D. and R. L. Prentice (2011): The statistical analysis of failure time data. John Wiley & Sons, New York, USA.Search in Google Scholar

Khan, M. H. R. (2013): “Variable selection and estimation procedures for high-dimensional survival data,” Ph.D. Thesis, Department of Statistics, University of Warwick, UK.Search in Google Scholar

Khan, M. H. R. (2018): “On the performance of adaptive pre-processing technique in analysing high-dimensional censored data,” Biom. J., 60, 687–702.10.1002/bimj.201600256Search in Google Scholar PubMed

Khan, M. H. R. and J. E. H. Shaw (2016): “Variable selection for survival data with a class of adaptive elastic net techniques,” Stat. Comput., 26, 725–741.10.1007/s11222-015-9555-8Search in Google Scholar

Khan, M. H. R. and J. E. H. Shaw (2019): “Variable selection for accelerated lifetime models with synthesized estimation techniques,” Stat. Methods Med. Res., 28, 937–952.10.1177/0962280217739522Search in Google Scholar PubMed

Leng, C., Y. Lin and G. Wahba (2006): “A note on the LASSO and related procedures in model selection,” Stat. Sin., 16, 1273–1284.Search in Google Scholar

Li, H. and Y. Luan (2003): “Kernel Cox regression models for linking gene expression profiles to censored survival data,” Pac. Symp. Biocomput., 8, 65–76.10.1142/9789812776303_0007Search in Google Scholar

Meinshausen, N. and P. Bühlmann (2010): “Stability selection,” J. R. Stat. Soc. B, 72, 417–473.10.1111/j.1467-9868.2010.00740.xSearch in Google Scholar

Sauerbrei, W. and M. Schumacher (1992): “A bootstrap resampling procedure for model building: Application to the cox regression model,” Stat. Med., 11, 2093–2109.10.1002/sim.4780111607Search in Google Scholar PubMed

Stute, W. (1993): “Consistent estimation under random censorship when covariables are present,” J. Multivariate Anal., 45, 89–103.10.1006/jmva.1993.1028Search in Google Scholar

Swindell, W. (2009): “Accelerated failure time models provide a useful statistical framework for aging research,” Exp. Gerontol., 44, 190–200.10.1016/j.exger.2008.10.005Search in Google Scholar PubMed

Ternes, N., F. Rotolo and S. Michielsa (2016): “Empirical extensions of the LASSO penalty to reduce the false discovery rate in high dimensional cox regression models,” Stat. Med., 35, 2561–2573.10.1002/sim.6927Search in Google Scholar PubMed

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. B, 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

Tibshirani, R. (1997): “The lasso method for variable selection in the cox model,” Stat. Med., 16, 385–395.10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3Search in Google Scholar PubMed

Van De Vijver, M. J., Y. D. He, L. J. van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend and R. Bernards (2002): “A gene-expression signature as a predictor of survival in breast cancer,” N. Engl. J. Med., 347, 1999–2009.10.1056/NEJMoa021967Search in Google Scholar PubMed

van’t Veer, L. J., H. Dai, M. J. Van De Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards and S. H. Friend (2002): “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, 415, 530–536.10.1038/415530aSearch in Google Scholar PubMed

Walschaerts, M., E. Leconte and P. Besse (2012): “Stable variable selection for right censored data: comparison of methods,” arXiv preprint arXiv:1203.4928.Search in Google Scholar

Wang, S., B. Nan, J. Zhu and D. Beer (2008): “Doubly penalized buckley-james method for survival data with high-dimensional covariates,” Biometrics, 64, 132–140.10.1111/j.1541-0420.2007.00877.xSearch in Google Scholar PubMed

Wei, L. (1992): “The accelerated failure time model: a useful alternative to the cox regression model in survival analysis,” Stat. Med., 11, 1871–1879.10.1002/sim.4780111409Search in Google Scholar

Wright, G., W. Chan, J. Connors, E. Campo, R. Fisher, R. Gascoyne, H. Muller-Hermelink, E. Smeland, J. Giltnane, E. Hurt, H. Zhao, L. Averett, L. Yang, W. Wilson, E. Jaffe, R. Simon, R. Klausner, J. Powell, P. Duffey, D. Longo, T. Greiner, D. Weisenburger, W. Sanger, B. Dave, J. Lynch, J. Vose, J. Armitage, E. Montserrat, A. Lopez-Guillermo, T. Grogan, T. Miller, M. LeBlanc, G. Ott, S. Kvaloy, J. Delabie, H. Holte, P. Krajci, T. Stokke and L. Staudt (2002): “The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma,” N. Engl. J. Med., 346, 1937–1947.10.1056/NEJMoa012914Search in Google Scholar PubMed

Zhang, H. H. and W. Lu (2007): “Adaptive lasso for Cox’s proportional hazards model,” Biometrika, 94, 691–703.10.1093/biomet/asm037Search in Google Scholar

Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc. B, 67, 301–320.10.1111/j.1467-9868.2005.00503.xSearch in Google Scholar

Published Online: 2019-10-07

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 27.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2017-0001/html
Scroll to top button