Skip to main content
Log in

A likelihood-based approach for cure regression models

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We propose a new likelihood-based approach for estimation, inference and variable selection for parametric cure regression models in time-to-event analysis under random right-censoring. In this context, it often happens that some subjects are “cured”, i.e., they will never experience the event of interest. Then, the sample of censored observations is an unlabeled mixture of cured and “susceptible” subjects. Using inverse probability censoring weighting (IPCW), we propose a likelihood-based estimation procedure for the cure regression model without making assumptions about the distribution of survival times for the susceptible subjects. The IPCW approach does require a preliminary estimate of the censoring distribution, for which general parametric, semi- or nonparametric approaches can be used. The incorporation of a penalty term in our estimation procedure is straightforward; in particular, we propose \(\ell _1\)-type penalties for variable selection. Our theoretical results are derived under mild assumptions. Simulation experiments and real data analysis illustrate the effectiveness of the new approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Amico M, Van Keilegom I (2018) Cure models in survival analysis. Ann Rev Stat App 5:311–342

    Article  MathSciNet  Google Scholar 

  • Amico M, Van Keilegom I, Legrand C (2018) The single-index/Cox mixture cure model. Biometrics. https://doi.org/10.1111/biom.12999

    Article  MATH  Google Scholar 

  • Andersen P, Gill R (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10:1100–1120

    Article  MathSciNet  Google Scholar 

  • Anderson P, Borgan Ø, Gill R, Keiding N (1993) Statistical models based on counting processes. Springer series in statistics. Springer, Berlin

    Google Scholar 

  • Berkson J, Gage R (1952) Survival curve for cancer patients following treatment. J Am Stat Ass 47:501–515

    Article  Google Scholar 

  • Chao C, Yubo Z, Yingwei P, Jiajia Z (2012) smcure: fit semiparametric mixture cure models. https://CRAN.R-project.org/package=smcure, r package version 2.0

  • Cox D (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220

    MATH  Google Scholar 

  • Cox D (1975) Partial likelihood. Biometrika 62:269–276

    Article  MathSciNet  Google Scholar 

  • Delecroix M, Lopez O, Patilea V (2008) Nonlinear censored regression using synthetic data. Scand J Stat 35:248–265

    Article  MathSciNet  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Farewell V (1977) A model for a binary variable with time-censored observations. Biometrika 64:43–46

    Article  Google Scholar 

  • Gerds T, Beyersmann J, Starkopf L, Frank S, van der Laan M, Schumacher M (2017) The Kaplan–Meier integral in the presence of covariates: a review. In: Wenceslao González M, Schmidt T, Wang J (eds) Ferger D. Festschrift in honour of Winfried Stute, Springer International Publishing, From statistics to mathematical finance, pp 25–41

  • Gill R, Johansen S (1990) A survey of product-integration with a view toward application in survival analysis. Ann Stat 18:1501–1555

    Article  MathSciNet  Google Scholar 

  • Gourieroux C, Monfort A, Trognon A (1984) Pseudo maximum likelihood methods: theory. Econometrica 52:681–700

    Article  MathSciNet  Google Scholar 

  • Grambsch P, Therneau T (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81:515–526

    Article  MathSciNet  Google Scholar 

  • Hitomi K, Nishiyama Y, Okui R (2008) A puzzling phenomenon in semiparametric estimation problems with infinite-dimensional nuisance parameters. Econom Theory 24(6):1717–1728

    Article  MathSciNet  Google Scholar 

  • Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken

    Book  Google Scholar 

  • Koul H, Susarla V, Van Ryzin J (1981) Regression analysis with randomly right-censored data. Ann Stat 9:1276–1288

    Article  MathSciNet  Google Scholar 

  • Li C, Taylor J (2002) A semi-parametric accelerated failure time cure model. Stat Med 21:3235–3247

    Article  Google Scholar 

  • Liu X, Peng Y, Tu D, Liang H (2012) Variable selection in semiparametric cure models based on penalized likelihood, with application to breast cancer clinical trials. Stat Med 31:2882–2891

    Article  MathSciNet  Google Scholar 

  • Lopez O (2011) Nonparametric estimation of the multivariate distribution function in a censored regression model with applications. Commun Stat Theory Methods 40:2639–2660

    Article  MathSciNet  Google Scholar 

  • Moertel C, Fleming T, Macdonald J, Haller D, Laurie J, Goodman P, Ungerleider J, Emerson W, Tormey D, Glick J (1990) Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. N Engl J Med 322:352–358

    Article  Google Scholar 

  • Peng Y, Dear K (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243

    Article  Google Scholar 

  • Peng Y, Dear K, Denham J (1998) A generalized f mixture model for cure rate estimation. Stat Med 17:813–830

    Article  Google Scholar 

  • R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Ritov Y (1990) Estimation in a linear regression model with censored data. Ann Stat 18:303–328

    MathSciNet  MATH  Google Scholar 

  • Robins J, Finkelstein D (2000) Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics 56:779–788

    Article  Google Scholar 

  • Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471

    MathSciNet  MATH  Google Scholar 

  • Sy J, Taylor J (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56:227–236

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tsiatis A (1990) Estimating regression parameters using linear rank tests for censored data. Ann Stat 18:354–372

    Article  MathSciNet  Google Scholar 

  • Van der Laan M, Laan M, Robins J (2003) Unified methods for censored longitudinal data and causality. Springer, Berlin

    Book  Google Scholar 

  • van der Vaart A (2000) Asymptotic statistics. Cambridge University Press, Cambridge

    Google Scholar 

  • van der Vaart A, Wellner J (1996) Weak convergence and empirical process: with applications to statistics. Springer, Berlin

    Book  Google Scholar 

  • Xie J, Liu C (2005) Adjusted Kaplan–Meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Stat Med 24:3089–3110

    Article  MathSciNet  Google Scholar 

  • Xu J, Peng Y (2014) Nonparametric cure rate estimation with covariates. Can J Stat 42(1):1–17

    Article  MathSciNet  Google Scholar 

  • Yu B, Tiwari R, Cronin K, Feuer E (2004) Cure fraction estimation from the mixture cure models for grouped survival data. Stat Med 23:1733–1747

    Article  Google Scholar 

  • Zhang J, Peng Y (2007) A new estimation method for the semiparametric accelerated failure time mixture cure model. Stat Med 26:3157–3171

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Ass 101:1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Burke.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Both authors acknowledge the support of the Irish Research Council and the French Ministry of Foreign Affairs through the Ulysses scheme. V. Patilea acknowledges support from the research program New Challenges for New Data of Fondation du Risque/ILB and LCL.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 201 KB)

Appendix

Appendix

Lemma 2

Let r(YX) be an integrable real-valued function. Under conditions (1) and (2),

$$\begin{aligned} \mathbb {E}\left( \left. \frac{\Delta r(Y,X)}{S_C(Y-\mid X)} ~\right| X\right) ~=~ \mathbb {E}\{r(T_0,X)\mid X\} \{1-\pi (X)\}. \end{aligned}$$

Proof of Lemma  2

First, we have that

$$\begin{aligned} \mathbb {E}(\Delta \mid T_0, X )= & {} \mathbb {E}\{\mathbb {1}(T_0 \le C) (1-B) \mid T_0, X \} \\= & {} \mathbb {E}\{\mathbb {1}(T_0 \le C)\mathbb {E}(1-B \mid C, T_0, X ) \mid T_0, X \}\\= & {} S_C(T_0-\mid X) \,\{1-\pi (X)\} , \end{aligned}$$

where \(\mathbb {E}(1-B \mid C, T_0, X ) = \mathbb {E}(1-B \mid X) = 1-\pi (X)\) follows from (2), and \(\mathbb {E}[\mathbb {1}(T_0 \le C) \mid T_0, X ] = S_C(T_0-\mid X)\) follows from (1). Thus,

$$\begin{aligned}&\mathbb {E}\left( \left. \frac{\Delta r(Y,X)}{S_C(Y-\mid X)} ~\right| X\right) \\&= \mathbb {E}\left( \left. \frac{\Delta r(T_0,X)}{S_C(T_0-\mid X)} ~\right| X\right) \\&= \mathbb {E}\left\{ \left. \frac{ r(T_0,X)}{S_C(T_0-\mid X)} \mathbb {E}\left( \left. \Delta ~\right| T_0, X\right) ~\right| X\right\} \\&=\mathbb {E}\{r(T_0,X)\mid X\} \{1-\pi (X)\}, \end{aligned}$$

as required. (The first equality holds since only the \(\Delta =1\) case contributes where \(Y=T_0\).) \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Burke, K., Patilea, V. A likelihood-based approach for cure regression models. TEST 30, 693–712 (2021). https://doi.org/10.1007/s11749-020-00738-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-020-00738-8

Keywords

Mathematics Subject Classification

Navigation