Abstract
We propose a new likelihood-based approach for estimation, inference and variable selection for parametric cure regression models in time-to-event analysis under random right-censoring. In this context, it often happens that some subjects are “cured”, i.e., they will never experience the event of interest. Then, the sample of censored observations is an unlabeled mixture of cured and “susceptible” subjects. Using inverse probability censoring weighting (IPCW), we propose a likelihood-based estimation procedure for the cure regression model without making assumptions about the distribution of survival times for the susceptible subjects. The IPCW approach does require a preliminary estimate of the censoring distribution, for which general parametric, semi- or nonparametric approaches can be used. The incorporation of a penalty term in our estimation procedure is straightforward; in particular, we propose \(\ell _1\)-type penalties for variable selection. Our theoretical results are derived under mild assumptions. Simulation experiments and real data analysis illustrate the effectiveness of the new approach.
Similar content being viewed by others
References
Amico M, Van Keilegom I (2018) Cure models in survival analysis. Ann Rev Stat App 5:311–342
Amico M, Van Keilegom I, Legrand C (2018) The single-index/Cox mixture cure model. Biometrics. https://doi.org/10.1111/biom.12999
Andersen P, Gill R (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10:1100–1120
Anderson P, Borgan Ø, Gill R, Keiding N (1993) Statistical models based on counting processes. Springer series in statistics. Springer, Berlin
Berkson J, Gage R (1952) Survival curve for cancer patients following treatment. J Am Stat Ass 47:501–515
Chao C, Yubo Z, Yingwei P, Jiajia Z (2012) smcure: fit semiparametric mixture cure models. https://CRAN.R-project.org/package=smcure, r package version 2.0
Cox D (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220
Cox D (1975) Partial likelihood. Biometrika 62:269–276
Delecroix M, Lopez O, Patilea V (2008) Nonlinear censored regression using synthetic data. Scand J Stat 35:248–265
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
Farewell V (1977) A model for a binary variable with time-censored observations. Biometrika 64:43–46
Gerds T, Beyersmann J, Starkopf L, Frank S, van der Laan M, Schumacher M (2017) The Kaplan–Meier integral in the presence of covariates: a review. In: Wenceslao González M, Schmidt T, Wang J (eds) Ferger D. Festschrift in honour of Winfried Stute, Springer International Publishing, From statistics to mathematical finance, pp 25–41
Gill R, Johansen S (1990) A survey of product-integration with a view toward application in survival analysis. Ann Stat 18:1501–1555
Gourieroux C, Monfort A, Trognon A (1984) Pseudo maximum likelihood methods: theory. Econometrica 52:681–700
Grambsch P, Therneau T (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81:515–526
Hitomi K, Nishiyama Y, Okui R (2008) A puzzling phenomenon in semiparametric estimation problems with infinite-dimensional nuisance parameters. Econom Theory 24(6):1717–1728
Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken
Koul H, Susarla V, Van Ryzin J (1981) Regression analysis with randomly right-censored data. Ann Stat 9:1276–1288
Li C, Taylor J (2002) A semi-parametric accelerated failure time cure model. Stat Med 21:3235–3247
Liu X, Peng Y, Tu D, Liang H (2012) Variable selection in semiparametric cure models based on penalized likelihood, with application to breast cancer clinical trials. Stat Med 31:2882–2891
Lopez O (2011) Nonparametric estimation of the multivariate distribution function in a censored regression model with applications. Commun Stat Theory Methods 40:2639–2660
Moertel C, Fleming T, Macdonald J, Haller D, Laurie J, Goodman P, Ungerleider J, Emerson W, Tormey D, Glick J (1990) Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. N Engl J Med 322:352–358
Peng Y, Dear K (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243
Peng Y, Dear K, Denham J (1998) A generalized f mixture model for cure rate estimation. Stat Med 17:813–830
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Ritov Y (1990) Estimation in a linear regression model with censored data. Ann Stat 18:303–328
Robins J, Finkelstein D (2000) Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics 56:779–788
Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471
Sy J, Taylor J (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56:227–236
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
Tsiatis A (1990) Estimating regression parameters using linear rank tests for censored data. Ann Stat 18:354–372
Van der Laan M, Laan M, Robins J (2003) Unified methods for censored longitudinal data and causality. Springer, Berlin
van der Vaart A (2000) Asymptotic statistics. Cambridge University Press, Cambridge
van der Vaart A, Wellner J (1996) Weak convergence and empirical process: with applications to statistics. Springer, Berlin
Xie J, Liu C (2005) Adjusted Kaplan–Meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Stat Med 24:3089–3110
Xu J, Peng Y (2014) Nonparametric cure rate estimation with covariates. Can J Stat 42(1):1–17
Yu B, Tiwari R, Cronin K, Feuer E (2004) Cure fraction estimation from the mixture cure models for grouped survival data. Stat Med 23:1733–1747
Zhang J, Peng Y (2007) A new estimation method for the semiparametric accelerated failure time mixture cure model. Stat Med 26:3157–3171
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Ass 101:1418–1429
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Both authors acknowledge the support of the Irish Research Council and the French Ministry of Foreign Affairs through the Ulysses scheme. V. Patilea acknowledges support from the research program New Challenges for New Data of Fondation du Risque/ILB and LCL.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Lemma 2
Let r(Y, X) be an integrable real-valued function. Under conditions (1) and (2),
Proof of Lemma 2
First, we have that
where \(\mathbb {E}(1-B \mid C, T_0, X ) = \mathbb {E}(1-B \mid X) = 1-\pi (X)\) follows from (2), and \(\mathbb {E}[\mathbb {1}(T_0 \le C) \mid T_0, X ] = S_C(T_0-\mid X)\) follows from (1). Thus,
as required. (The first equality holds since only the \(\Delta =1\) case contributes where \(Y=T_0\).) \(\square \)
Rights and permissions
About this article
Cite this article
Burke, K., Patilea, V. A likelihood-based approach for cure regression models. TEST 30, 693–712 (2021). https://doi.org/10.1007/s11749-020-00738-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-020-00738-8