Abstract
The instability in the selection of models is a major concern with data sets containing a large number of covariates. We focus on stability selection which is used as a technique to improve variable selection performance for a range of selection methods, based on aggregating the results of applying a selection procedure to sub-samples of the data where the observations are subject to right censoring. The accelerated failure time (AFT) models have proved useful in many contexts including the heavy censoring (as for example in cancer survival) and the high dimensionality (as for example in micro-array data). We implement the stability selection approach using three variable selection techniques—Lasso, ridge regression, and elastic net applied to censored data using AFT models. We compare the performances of these regularized techniques with and without stability selection approaches with simulation studies and two real data examples–a breast cancer data and a diffuse large B-cell lymphoma data. The results suggest that stability selection gives always stable scenario about the selection of variables and that as the dimension of data increases the performance of methods with stability selection also improves compared to methods without stability selection irrespective of the collinearity between the covariates.
Acknowledgement
We thank Institute of Statistical Research and Training (ISRT), University of Dhaka, Bangladesh for giving us the platform to conduct this research study.
Conflict of interest statement: The authors have declared no conflict of interest.
References
Ambroise, C. and G. J. McLachlan (2002): “Selection bias in gene extraction on the basis of microarray gene-expression data,” PNAS, 99, 6562–6566.10.1073/pnas.102102699Search in Google Scholar PubMed PubMed Central
Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n,” Ann. Stat., 35, 2313–2351.10.1214/009053606000001523Search in Google Scholar
Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Stat., 32, 407–499.10.1214/009053604000000067Search in Google Scholar
Fan, J. and R. Li (2002): “Variable selection for Cox’s proportional hazards model and frailty model,” Ann. Stat., 30, 74–99.10.1214/aos/1015362185Search in Google Scholar
Faraggi, D. and R. Simon (1998): “Bayesian variable selection method for censored survival data,” Biometrics, 54, 1475–85.10.2307/2533672Search in Google Scholar PubMed
Gatter, K. and F. Pezzella (2010): “Diffuse large B-cell lymphoma,” Diagn. Histopathol., 16, 69–81.10.1016/j.mpdhp.2009.12.002Search in Google Scholar
G’Sell, M. G., T. Hastie and R. Tibshirani (2013): “False variable selection rates in regression,” arXiv, arXiv:1302.2303.Search in Google Scholar
Gui, J. and H. Li (2005a): “Penalized Cox regression analysis in the highdimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21, 3001–3008.10.1093/bioinformatics/bti422Search in Google Scholar PubMed
Gui, J. and H. Li (2005b): “Threshold gradient descent method for censored data regression, with applications in pharmacogenomics,” Pac. Symp. Biocomput., 10, 272–283.10.1142/9789812702456_0026Search in Google Scholar PubMed
Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: applications to nonorthogonal problems,” Technometrics, 12, 69–82.10.1080/00401706.1970.10488635Search in Google Scholar
Huang, J. and S. Ma (2010a): “Variable selection in the accelerated failure time model via the bridge method,” Lifetime Data Anal., 16, 176–195.10.1007/s10985-009-9144-2Search in Google Scholar PubMed PubMed Central
Huang, J. and S. Ma (2010b): “Variable selection in the accelerated failure time model via the bridge method,” Lifetime Data Anal., 16, 176–195.10.1007/s10985-009-9144-2Search in Google Scholar PubMed PubMed Central
Huang, J., S. Ma and H. Xie (2006): “Regularized estimation in the accelerated failure time model with high-dimensional covariates,” Biometrics, 62, 813–820.10.1111/j.1541-0420.2006.00562.xSearch in Google Scholar PubMed
Ibrahim, J. G., M.-H. Chen and S. N. Maceachern (1999): “Bayesian variable selection for proportional hazards models,” Can. J. Stat., 27, 701–717.10.2307/3316126Search in Google Scholar
Ioannidis, J. P. A. (2005): “Selection bias in gene extraction on the basis of microarray gene-expression data,” PLoS Med., 2, e124.10.1371/journal.pmed.0020124Search in Google Scholar PubMed PubMed Central
James, G. M. and P. Radchenko (2009): “A generalized dantzig selector with shrinkage tuning,” Biometrika, 96, 323–337.10.1093/biomet/asp013Search in Google Scholar
Kalbfleisch, J. D. and R. L. Prentice (2011): The statistical analysis of failure time data. John Wiley & Sons, New York, USA.Search in Google Scholar
Khan, M. H. R. (2013): “Variable selection and estimation procedures for high-dimensional survival data,” Ph.D. Thesis, Department of Statistics, University of Warwick, UK.Search in Google Scholar
Khan, M. H. R. (2018): “On the performance of adaptive pre-processing technique in analysing high-dimensional censored data,” Biom. J., 60, 687–702.10.1002/bimj.201600256Search in Google Scholar PubMed
Khan, M. H. R. and J. E. H. Shaw (2016): “Variable selection for survival data with a class of adaptive elastic net techniques,” Stat. Comput., 26, 725–741.10.1007/s11222-015-9555-8Search in Google Scholar
Khan, M. H. R. and J. E. H. Shaw (2019): “Variable selection for accelerated lifetime models with synthesized estimation techniques,” Stat. Methods Med. Res., 28, 937–952.10.1177/0962280217739522Search in Google Scholar PubMed
Leng, C., Y. Lin and G. Wahba (2006): “A note on the LASSO and related procedures in model selection,” Stat. Sin., 16, 1273–1284.Search in Google Scholar
Li, H. and Y. Luan (2003): “Kernel Cox regression models for linking gene expression profiles to censored survival data,” Pac. Symp. Biocomput., 8, 65–76.10.1142/9789812776303_0007Search in Google Scholar
Meinshausen, N. and P. Bühlmann (2010): “Stability selection,” J. R. Stat. Soc. B, 72, 417–473.10.1111/j.1467-9868.2010.00740.xSearch in Google Scholar
Sauerbrei, W. and M. Schumacher (1992): “A bootstrap resampling procedure for model building: Application to the cox regression model,” Stat. Med., 11, 2093–2109.10.1002/sim.4780111607Search in Google Scholar PubMed
Stute, W. (1993): “Consistent estimation under random censorship when covariables are present,” J. Multivariate Anal., 45, 89–103.10.1006/jmva.1993.1028Search in Google Scholar
Swindell, W. (2009): “Accelerated failure time models provide a useful statistical framework for aging research,” Exp. Gerontol., 44, 190–200.10.1016/j.exger.2008.10.005Search in Google Scholar PubMed
Ternes, N., F. Rotolo and S. Michielsa (2016): “Empirical extensions of the LASSO penalty to reduce the false discovery rate in high dimensional cox regression models,” Stat. Med., 35, 2561–2573.10.1002/sim.6927Search in Google Scholar PubMed
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. B, 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar
Tibshirani, R. (1997): “The lasso method for variable selection in the cox model,” Stat. Med., 16, 385–395.10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3Search in Google Scholar PubMed
Van De Vijver, M. J., Y. D. He, L. J. van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend and R. Bernards (2002): “A gene-expression signature as a predictor of survival in breast cancer,” N. Engl. J. Med., 347, 1999–2009.10.1056/NEJMoa021967Search in Google Scholar PubMed
van’t Veer, L. J., H. Dai, M. J. Van De Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards and S. H. Friend (2002): “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, 415, 530–536.10.1038/415530aSearch in Google Scholar PubMed
Walschaerts, M., E. Leconte and P. Besse (2012): “Stable variable selection for right censored data: comparison of methods,” arXiv preprint arXiv:1203.4928.Search in Google Scholar
Wang, S., B. Nan, J. Zhu and D. Beer (2008): “Doubly penalized buckley-james method for survival data with high-dimensional covariates,” Biometrics, 64, 132–140.10.1111/j.1541-0420.2007.00877.xSearch in Google Scholar PubMed
Wei, L. (1992): “The accelerated failure time model: a useful alternative to the cox regression model in survival analysis,” Stat. Med., 11, 1871–1879.10.1002/sim.4780111409Search in Google Scholar
Wright, G., W. Chan, J. Connors, E. Campo, R. Fisher, R. Gascoyne, H. Muller-Hermelink, E. Smeland, J. Giltnane, E. Hurt, H. Zhao, L. Averett, L. Yang, W. Wilson, E. Jaffe, R. Simon, R. Klausner, J. Powell, P. Duffey, D. Longo, T. Greiner, D. Weisenburger, W. Sanger, B. Dave, J. Lynch, J. Vose, J. Armitage, E. Montserrat, A. Lopez-Guillermo, T. Grogan, T. Miller, M. LeBlanc, G. Ott, S. Kvaloy, J. Delabie, H. Holte, P. Krajci, T. Stokke and L. Staudt (2002): “The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma,” N. Engl. J. Med., 346, 1937–1947.10.1056/NEJMoa012914Search in Google Scholar PubMed
Zhang, H. H. and W. Lu (2007): “Adaptive lasso for Cox’s proportional hazards model,” Biometrika, 94, 691–703.10.1093/biomet/asm037Search in Google Scholar
Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc. B, 67, 301–320.10.1111/j.1467-9868.2005.00503.xSearch in Google Scholar
© 2019 Walter de Gruyter GmbH, Berlin/Boston