Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter September 17, 2019

Bi-level feature selection in high dimensional AFT models with applications to a genomic study

  • Hailin Huang , Jizi Shangguan , Peifeng Ruan and Hua Liang EMAIL logo

Abstract

We propose a new bi-level feature selection method for high dimensional accelerated failure time models by formulating the models to a single index model. The method yields sparse solutions at both the group and individual feature levels along with an expedient algorithm, which is computationally efficient and easily implemented. We analyze a genomic dataset for an illustration, and present a simulation study to show the finite sample performance of the proposed method.

Acknowledgements

The authors thank Associate Editor, Dr. Korbinian Strimmer, and a referee for their constructive suggestions and comments that have substantially improved an earlier version of this paper. Liang’s research was partially supported by NSF grant DMS-1620898.

References

Bednarski, A. E., S. C. Elgin and H. B. Pakrasi (2005). “An inquiry into protein structure and genetic disease: introducing undergraduates to bioinformatics in a large introductory course,” Cell Biol. Educ., 4, 207–220.10.1187/cbe.04-07-0044Search in Google Scholar PubMed PubMed Central

Breheny, P. (2015). “The group exponential lasso for bi-level variable selection,” Biometrics, 71, 731–740.10.1111/biom.12300Search in Google Scholar PubMed

Breheny, P. and J. Huang (2009). “Penalized methods for bi-level variable selection,” Stat. Its Interface, 2, 369–380.10.4310/SII.2009.v2.n3.a10Search in Google Scholar

Buckley, J. and I. James (1979). “Linear regression with censored data,” Biometrika, 66, 429–436.10.1093/biomet/66.3.429Search in Google Scholar

Carroll, R. J., J. Fan, I. Gijbels and M. P. Wand (1997). “Generalized partially linear single-index models,” J. Am. Stat. Assoc., 92, 477–489.10.1080/01621459.1997.10474001Search in Google Scholar

Fan, J. and R. Li (2001) . “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360.10.1198/016214501753382273Search in Google Scholar

Fei, F., J. Qu, M. Zhang, Y. Li and S. Zhang (2017). “S100A4 in cancer progression and metastasis: a systematic review,” Oncotarget, 8, 73219.10.18632/oncotarget.18016Search in Google Scholar PubMed PubMed Central

Flanagan, J. M., J. M. Funes, S. Henderson, L. Wild, N. Carey and C. Boshoff (2009). “Genomics screen in transformed stem cells reveals RNASEH2A, PPAP2C, and ADARB1 as putative anticancer drug targets,” Mol. Cancer Ther., 8, 249–260.10.1158/1535-7163.MCT-08-0636Search in Google Scholar PubMed

Gui, J. and H. Li (2005). “Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21, 3001–3008.10.1093/bioinformatics/bti422Search in Google Scholar PubMed

Huang, J., P. Breheny and S. Ma (2012). “A selective review of group selection in high-dimensional models, Stat. Sci., 27, 481–499.10.1214/12-STS392Search in Google Scholar PubMed PubMed Central

Huang, J., S. Ma, H. Xie and C.-H. Zhang (2009). “A group bridge approach for variable selection,” Biometrika, 96, 339–355.10.1093/biomet/asp020Search in Google Scholar PubMed

Ichimura, H. (1993). “Semiparametric least squares (SLS) and weighted SLS estimation of single-index models,” J. Econom., 58, 71–120.10.1016/0304-4076(93)90114-KSearch in Google Scholar

Lee, K. H., S. Chakraborty and J. Sun (2017). “Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior,” Comput. Stat. Data Anal., 112, 1–13.10.1016/j.csda.2017.02.014Search in Google Scholar

Lek, M., K. J. Karczewski, E. V. Minikel, K. E. Samocha, E. Banks, T. Fennell, A. H. O’Donnell-Luria, J. S. Ware, A. J. Hill, B. B. Cummings, T. Tukiainen, D. P. Birnbaum, J. A. Kosmicki, L. E. Duncan, K. Estrada, F. Zhao, J. Zou, E. Pierce-Hoffman, J. Berghout, D. N. Cooper, N. Deflaux, M. DePristo, R. Do, J. Flannick, M. Fromer, L. Gauthier, J. Goldstein, N. Gupta, D. Howrigan, A. Kiezun, M. I. Kurki, A. L. Moonshine, P. Natarajan, L. Orozco, G. M. Peloso, R. Poplin, M. A. Rivas, V. Ruano-Rubio, S. A. Rose, D. M. Ruderfer, K. Shakir, P. D. Stenson, C. Stevens, B. P. Thomas, G. Tiao, M. T. Tusie-Luna, B. Weisburd, H.-H. Won, D. Yu, D. M. Altshuler, D. Ardissino, M. Boehnke, J. Danesh, S. Donnelly, R. Elosua, J. C. Florez, S. B. Gabriel, G. Getz, S. J. Glatt, C. M. Hultman, S. Kathiresan, M. Laakso, S. McCarroll, M. I. McCarthy, D. McGovern, R. McPherson, B. M. Neale, A. Palotie, S. M. Purcell, D. Saleheen, J. M. Scharf, P. Sklar, P. F. Sullivan, J. Tuomilehto, M. T. Tsuang, H. C. Watkins, J. G. Wilson, M. J. Daly, D. G. MacArthur and Exome Aggregation Consortium (2016). “Analysis of protein-coding genetic variation in 60,706 humans,” Nature, 536, 285–291.10.1038/nature19057Search in Google Scholar PubMed PubMed Central

Liang, H., X. Liu, R. Li and C. L. Tsai (2010). “Estimation and testing for partially linear single-index models,” Ann. Stat., 38, 3811–3836.10.1214/10-AOS835Search in Google Scholar PubMed PubMed Central

Liu, J., J. Huang, Y. Zhang, Q. Lan, N. Rothman, T. Zheng and S. Ma (2013). “Identification of gene–environment interactions in cancer studies using penalization,” Genomics, 102, 189–194.10.1016/j.ygeno.2013.08.006Search in Google Scholar PubMed PubMed Central

Magger, O., Y. Y. Waldman, E. Ruppin and R. Sharan (2012). “Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks,” PLoS Comput. Biol., 8, e1002690.10.1371/journal.pcbi.1002690Search in Google Scholar PubMed PubMed Central

Mi, H., A. Muruganujan, J. T. Casagrande and P. D. Thomas (2013). “Large-scale gene function analysis with the panther classification system,” Nat. Protoc., 8, 1551.10.1038/nprot.2013.092Search in Google Scholar PubMed PubMed Central

Quan, M., J.-J. Cui, X. Feng and Q. Huang (2017). “The critical role and potential target of the autotaxin/lysophosphatidate axis in pancreatic cancer,” Tumor Biol., 39, 1010428317694544.10.1177/1010428317694544Search in Google Scholar PubMed

Rangaswami, H., A. Bulbule and G. C. Kundu (2006). “Osteopontin: role in cell signaling and cancer progression,” Trends Cell Biol., 16, 79–87.10.1016/j.tcb.2005.12.005Search in Google Scholar PubMed

Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso,” J. Royal Stat. Soc. B, 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

Timpson, N. J., C. M. Greenwood, N. Soranzo, D. J. Lawson and J. B. Richards (2018) . “Genetic architecture: the shape of the genetic contribution to human traits and disease,” Nat. Rev. Genet., 19, 110–124.10.1038/nrg.2017.101Search in Google Scholar PubMed

Trevino, V., F. Falciani and H. A. Barrera-Saldaña (2007). “DNA microarrays: a powerful genomic tool for biomedical and clinical research,” Mol. Med., 13, 527–541.10.2119/2006-00107.TrevinoSearch in Google Scholar PubMed PubMed Central

Wang, H., S. Lee, C. L. Nigro, L. Lattanzio, M. Merlano, M. Monteverde, R. Matin, K. Purdie, N. Mladkova, D. Bergamaschi, C. Harwood, N. Syed, P. Szlosarek, E. Briasoulis, A. McHugh, A. Thompson, A. Evans, I. Leigh, C. Fleming, G. J. Inman, E. Hatzimichael, C. Proby, T. Crook (2012). “NT5E (CD73) is epigenetically regulated in malignant melanoma and associated with metastatic site specificity,” Br. J. Cancer, 106, 1446.10.1038/bjc.2012.95Search in Google Scholar PubMed PubMed Central

Wang, L., G. Chen and H. Li (2007). “Group SCAD regression analysis for microarray time course gene expression data,” Bioinformatics, 23, 1486–1494.10.1093/bioinformatics/btm125Search in Google Scholar PubMed

Wang, T., P.-R. Xu and L.-X. Zhu (2012). “Non-convex penalized estimation in high-dimensional models with single-index structure,” J. Multivariate Anal., 109, 221–235.10.1016/j.jmva.2012.03.009Search in Google Scholar

Wang, Z. and C. Wang (2010). “Buckley-James boosting for survival analysis with high-dimensional biomarker data,” Stat. Appl. Genet. Mol. Biol., 9, 24.10.2202/1544-6115.1550Search in Google Scholar PubMed PubMed Central

Wei, L.-J. (1992). “The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis,” Stat. Med., 11, 1871–1879.10.1002/sim.4780111409Search in Google Scholar PubMed

Witten, D. M. and R. Tibshirani (2010). “Survival analysis with high-dimensional covariates,” Stat. Methods Med. Res., 19, 29–51.10.1177/0962280209105024Search in Google Scholar PubMed PubMed Central

Wu, J., W. Du, X. Wang, L. Wei, Y. Pan, X. Wu, J. Zhang and D. Pei (2018). “Ras-related protein Rap2c promotes the migration and invasion of human osteosarcoma cells,” Oncol. Lett., 15, 5352–5358.10.3892/ol.2018.7987Search in Google Scholar PubMed PubMed Central

Xu, L., S. S. Shen, Y. Hoshida, A. Subramanian, K. Ross, J.-P. Brunet, S. N. Wagner, S. Ramaswamy, J. P. Mesirov and R. O. Hynes (2008). “Gene expression changes in an animal melanoma model correlate with aggressiveness of human melanoma metastases,” Mol. Cancer Res., 6, 760–769.10.1158/1541-7786.MCR-07-0344Search in Google Scholar PubMed PubMed Central

Yuan, M. and Y. Lin (2006). “Model selection and estimation in regression with grouped variables,” J. Royal Stat. Soc. B, 68, 49–67.10.1111/j.1467-9868.2005.00532.xSearch in Google Scholar

Zeng, B., X. M. Wen and L. Zhu (2017). “A link-free sparse group variable selection method for single-index model,” J. Appl. Stat., 44, 2388–2400.10.1080/02664763.2016.1254731Search in Google Scholar

Zhang, C.-H. (2010). “Nearly unbiased variable selection under minimax concave penalty,” Ann. Stat., 38, 894–942.10.1214/09-AOS729Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2019-0016).


Published Online: 2019-09-17

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 20.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2019-0016/html
Scroll to top button