Abstract
A new class of survival frailty models based on the generalized inverse-Gaussian (GIG) distributions is proposed. We show that the GIG frailty models are flexible and mathematically convenient like the popular gamma frailty model. A piecewise-exponential baseline hazard function is employed, yielding flexibility for the proposed class. Although a closed-form observed log-likelihood function is available, simulation studies show that employing an EM-algorithm is advantageous concerning the direct maximization of this function. Further simulated results address the comparison of different methods for obtaining standard errors of the estimates and confidence intervals for the parameters. Additionally, the finite-sample behavior of the EM-estimators is investigated and the performance of the GIG models under misspecification assessed. We apply our methodology to a TARGET (Therapeutically Applicable Research to Generate Effective Treatments) data about the survival time of patients with neuroblastoma cancer and show some advantages of the GIG frailties over existing models in the literature.
Similar content being viewed by others
References
Abrahantes, J. C., Burzykowski, T. (2005). A version of the EM algorithm for proportional hazard model with random effects. Biometrical Journal, 47, 847–862.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Balakrishnan, N., Pal, S. (2016). Expectation maximization-based likelihood inference for flexible cure rate models with Weibull lifetimes. Statistical Methods in Medical Research, 25, 1535–1563.
Balakrishnan, N., Peng, Y. (2006). Generalized gamma frailty model. Statistics in Medicine, 25, 2797–2816.
Balan, T., Putter, H. (2019). frailtyEM: An R package for estimating semiparametric shared frailty models. Journal of Statistical Software, 90, 1–29.
Barndorff-Nielsen, O. E., Halgreen, C. (1977). Infinite divisibility of the Hyperbolic and generalized inverse Gaussian distribution. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 38, 309–312.
Barreto-Souza, W., Mayrink, V. D. (2019). Semiparametric generalized exponential frailty model for clustered survival data. Annals of the Institute of Statistical Mathematics, 71, 679–701.
Callegaro, A., Iacobelli, S. (2012). The Cox shared frailty model with log-skew-normal frailties. Statistical Modelling, 12, 399–418.
Carroll, W. L. (2013). Safety in numbers: Hyperdiploidy and prognosis. Blood, 121, 2374–237.
Chen, P., Zhang, J., Zhang, R. (2013). Estimation of the accelerated failure time frailty model under generalized gamma frailty. Computational Statistics and Data Analysis, 62, 171–180.
Clayton, D. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65, 141–151.
Cox, D. (1972). Regression models and life-tables. Journal of the Royal Statistical Society - Series B, 34, 187–220.
Crowder, M. (1989). A multivariate distribution with Weibull connections. Journal of the Royal Statistical Society - Series B, 51, 93–107.
Dastugue, N., Suciu, S., Plat, G., Speleman, F., Cave, H., Girard, S., Bakkus, M., Pages, M. P., Yakouben, K., Nelken, B., Uyttebroeck, A., Gervais, C., Lutz, P., Teixeira, M. R., Heimann, P., Ferster, A., Rohrlich, P., Collonge, M. A., Munzer, M., Luquet, I., Boutard, P., Sirvent, N., Karrasch, M., Bertrand, Y., Benoit, Y. (2013). Hyperdiploidy with 58–66 chromosomes in childhood B-acute lymphoblastic leukemia is highly curable: 58951 CLG-EORTC results. Blood, 121, 2415–2423.
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society - Series B, 39, 1–38.
Donovan, P., Cato, K., Legaie, R., Jayalath, R., Olsson, G., Hall, B., Olson, S., Boros, S., Reynolds, B., Harding, A. (2014). Hyperdiploid tumor cells increase phenotypic heterogeneity within Glioblastoma tumors. Molecular bioSystems, 10, 741–758.
Duchateau, L., Janssen, P. (2008). The Frailty Model. New York: Springer.
Duchateau, L., Janssen, P., Lindsey, P., Legrand, C., Nguti, R., Silvester, R. (2002). The shared frailty model and the power for heterogeneity tests in multicenter trials. Computational Statistics and Data Analysis, 40(3), 603–620.
Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. The Annals of Statistics, 7, 1–26.
Emura, T., Nakatochi, M., Murotani, K., Rondeau, V. (2017). A joint frailty-copula model between tumour progression and death for meta-analysis. Statistical Methods in Medical Research, 26(6), 2649–2666.
Emura, T., Matsui, S., Rondeau, V. (2019). Survival Analysis with Correlated Endpoints: Joint Frailty-Copula Models. JSS Research Series in Statistics. Singapore: Springer.
Enki, D. G., Noufaily, A., Farrington, P. (2014). A time-varying shared frailty model with application to infectious diseases. Annals of Applied Statistics, 8, 430–447.
Farrington, C., Unkel, S., Anaya-Izquierdo, K. (2012). The relative frailty variance and shared frailty models. Journal of the Royal Statistical Society - Series B, 74, 673–696.
Fletcher, R. (2000). Practical Methods of Optimization, 2nd ed., New York: Wiley.
Hanagal, D. D. (2019). Modeling Survival Data Using Frailty Models. Singapore: Springer.
Hirsch, K., Wienke, A. (2012). Software for semiparametric shared gamma and log-normal frailty models: an overview. Computer Methods and Programs in Biomedicine, 107(3), 582–597.
Hougaard, P. (1984). Life table methods for heterogeneous populations: Distributions describing the heterogeneity. Biometrika, 71, 75–83.
Hougaard, P. (1986). A class of multivariate failure time distributions. Biometrika, 73, 671–678.
Hougaard, P. (2000). Analysis of Multivariate Survival Data. New York: Springer.
Hougaard, P., Harvald, B., Holm, N. V. (1992). Measuring the similarities between the lifetimes of adult danish twins born between 1881–1930. Biometrika, 87, 17–24.
Kaplan, E., Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457–481.
Kim, J. S., Proschan, F. (1991). Piecewise exponential estimation of the survival function. IEEE Translations on Reliability, 40, 134–139.
Klein, J. P. (1992). Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics, 48, 795–806.
Lawless, J. F., Zhan, M. (1998). Analysis of interval-grouped recurrent event data using piecewise constant rate function. Canadian Journal of Statistics, 26, 549–565.
Leão, J., Leiva, V., Saulo, H., Tomazella, V. (2017). Birnbaum-Saunders frailty regression models: Diagnostics and application to medical data. Biometrical Journal, 59, 291–314.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society - Series B, 44, 226–233.
McGilchrist, C. A., Aisbett, C. W. (1991). Regression with frailty in survival analysis. Biometrics, 47, 461–466.
Monaco, V., Gorfine, M., Hsu, L. (2018). General semiparametric shared frailty model estimation and simulation with frailtySurv. Journal of Statistical Software, 86, 1–42.
Oakes, D. (1982). A model for association in bivariate survival data. Journal of the Royal Statistical Society - Series B, 44, 414–422.
Oakes, D. (1986). Semiparametric inference in a model for association in bivariate survival data. Biometrika, 73, 353–361.
Peng, M., Xiang, L., Wang, S. (2018). Semiparametric regression analysis of clustered survival data with semi-competing risks. Computational Statistics and Data Analysis, 124, 53–70.
Putter, H., van Houwelingen, H. C. (2015). Dynamic frailty models based on compound birth-death processes. Biostatistics, 16, 550–564.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed Aug 2020.
Schneider, S., Demarqui, F. N., Colosimo, E. A., Mayrink, V. D. (2019). An approach to model clustered survival data with dependent censoring. Biometrical Journal, 62(1), 157–174.
Therneau, T. (2015). A package for survival analysis in S. R package version 2.43.3. https://CRAN.R-project.org/package=survival. Accessed Aug 2019.
Therneau, T. M., Grambsch, P. M., Pankratz, V. S. (2003). Penalized survival models. Journal of Computational and Graphical Statistics, 12, 156–175.
Vaupel, J. W., Manton, K. G., Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16, 439–454.
Vu, H. T. V., Knuiman, M. W. (2002). A hybrid ML-EM algorithm for calculation of maximum likelihood estimates in semiparametric shared frailty models. Computational Statistics and Data Analysis, 40(1), 173–187.
Wang, H., Klein, J. P. (2012). Semiparametric estimation for the additive inverse Gaussian frailty model. Communications in Statistics - Theory and Methods, 41, 2269–2278.
Wienke, A. (2011). Frailty models in survival analysis. New York: Chapman and Hall/CRC.
Xiao, Y., Abrahamowicz, M. (2010). Bootstrap-based methods for estimating standard error Cox’s regression analyses of clustered event times. Statistics in Medicine, 29, 915–923.
Yashin, A., Vaupel, J. W., Iachine, I. (1995). Correlated individual frailty: an advantageous approach to survival analysis of bivariate data. Mathematical Population Studies, 5, 145–159.
Yoshimoto, M., Toledo, S., Caran, E., Seixas, M., Lee, M., Abib, S., Vianna, S., Schettini, S., Andrade, J. (1999). MYCN gene amplification: Identification of cell populations containing double minutes and homogeneously staining regions in neuroblastoma tumors. The American Journal of Pathology, 155, 1439–1443.
Zeng, D., Lin, D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika, 93, 627–640.
Zeng, D., Lin, D. Y. (2007). Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society - Series B, 69, 507–564.
Acknowledgements
We thank the Associate Editor and two anonymous Referees for their insightful comments and suggestions that lead to a great improvement of the paper. We also thank the National Cancer Institute (Office of Cancer Genomics) for granting us permission to use the TARGET Neuroblastoma Clinical data for publication. W. Barreto-Souza would also like to acknowledge support for his research from the KAUST Research Fund, NIH 1R01EB028753-01, and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq-Brazil, Grant number 305543/2018-0). Part of this work is from the Master’s Thesis of Luiza S.C. Piancastelli realized at the Department of Statistics of the Universidade Federal de Minas Gerais, Brazil.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 A.1 Observed information matrix
Define \(c_i(\theta )=\displaystyle \sum _{j=1}^{n_i}H_0(t_{ij})e^{x_{ij}^\top \beta }\) and \(a_{irs}(\theta )=\displaystyle \sum _{j=1}^{n_i}H_0(t_{ij})e^{x_{ij}^\top \beta }x_{ijr}x_{ijs}\), for \(r,s=1,\ldots ,p\), \(\varDelta _i\equiv \varDelta _i(\theta )=\bigg \{\alpha ^{-1}\bigg (\alpha ^{-1}+2\displaystyle \sum _{j=1}^{n_i}H_0(t_{ij})e^{x_{ij}^\top \beta }\bigg )\bigg \}^{1/2}\), and \(\lambda _i^*=\lambda +\sum _{j=1}^{n_i}\delta _{ij}\), for \(i=1,\ldots ,m\). We have that the elements of the observed information matrix \(J_n(\theta _*)=-\partial ^2\ell (\theta )/\partial \theta _*\partial \theta _*^\top \) are given by
for \(r,s=1,\ldots ,p\), and
1.2 A.2 Louis information matrix
From Louis (1982), we have that the information matrix obtained from the EM-algorithm, say \(\mathbf{I}_n(\theta _*)\), is given by
where we have defined \(Y^{obs}=\{(t_{ij},\delta _{ij}),\, j=1,\ldots ,n_i,\, i=1,\ldots ,m\}\).
Let \(\tau _i(\theta )=E(Z_i^2|Y^{obs})\) and \(\nu _i(\theta )=E(Z_i^{-2}|Y^{obs})\) for \(i=1,\ldots ,m\), where explicit expressions are directly available by using (3) and (9). The elements of the information matrix (10) are given by
and \(E\left( -\dfrac{\partial ^2\ell _c(\theta )}{\partial \beta _r\partial \alpha }\bigg |Y^{obs}\right) =0\), for \(r=1,\ldots ,p\), where we have defined \(a_{ir}(\theta )=\displaystyle \sum _{j=1}^{n_i}H_0(t_{ij})e^{x_{ij}^\top \beta }x_{ijr}\), \(b_{is}(\theta )=\displaystyle \sum _{j=1}^{n_i}H^{(s)}_0(t_{ij})e^{x_{ij}^\top \beta }\), for \(i=1,\ldots ,m\), \(r=1,\dots ,p\), and \(s=1,\ldots ,k+1\).
About this article
Cite this article
Piancastelli, L.S.C., Barreto-Souza, W. & Mayrink, V.D. Generalized inverse-Gaussian frailty models with application to TARGET neuroblastoma data. Ann Inst Stat Math 73, 979–1010 (2021). https://doi.org/10.1007/s10463-020-00774-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-020-00774-z