Skip to main content
Log in

Statistical analysis of clustered mixed recurrent-event data with application to a cancer survivor study

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

In long-term follow-up studies on recurrent events, the observation patterns may not be consistent over time. During some observation periods, subjects may be monitored continuously so that each event occurence time is known. While during the other observation periods, subjects may be monitored discretely so that only the number of events in each period is known. This results in mixed recurrent-event and panel-count data. In these data, there is dependence among within-subject events. Furthermore, if the data are collected from multiple centers, then there is another level of dependence among within-center subjects. Literature exists for clustered recurrent-event data, but not for clustered mixed recurrent-event and panel-count data. Ignoring the cluster effect may lead to less efficient analysis. In this paper, we present a marginal modeling approach to take into account the cluster effect and provide asymptotic distributions of the resulting regression parameters. Our simulation study demonstrates that this approach works well for practical situations. It was applied to a study comparing the hospitalization rates between childhood cancer survivors and healthy controls, with data collected from 26 medical institutions across North America during more than 20 years of follow-up.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cai J, Schaubel DE (2004) Marginal means/rates models for multiple type recurrent event data. Lifetime Data Anal 10:121–138

    Article  MathSciNet  Google Scholar 

  • Cheng SC, Wei LJ (2000) Inferences for a semiparametric model with panel data. Biometrika 87(1):89–97

    Article  MathSciNet  Google Scholar 

  • Cook RJ, Lawless JF (2007) The analysis of recurrent event data. Springer, New York

    MATH  Google Scholar 

  • Fang S, Zhang HX, Sun LQ, Wang DH (2017) Analysis of panel count data with time-dependent covariates and informative observation process. Acta Math Appl Sin Engl Ser 33(1):147–56

    Article  MathSciNet  Google Scholar 

  • He H, Pan D, Sun L, Li Y, Robison LL, Song X (2017) Analysis of a fixed center effect additive rates model for recurrent event data. Comput Stat Data Anal 112:186–197

    Article  MathSciNet  Google Scholar 

  • Lawless JF, Nadeau C (1995) Some simple robust methods for the analysis of recurrent events. Technometrics 37:158–168

    Article  MathSciNet  Google Scholar 

  • Li S, Sun Y, Huang CY, Follmann DA, Krause R (2016) Recurrent event data analysis with intermittently observed time varying covariates. Stat Med 35(18):3049–65

    Article  MathSciNet  Google Scholar 

  • Lin DY, Wei LJ, Yang I, Ying Z (2000) Semiparametric regression for the mean and rate function of recurrent events. J R Stat Soc Ser B 69:711–730

    Article  MathSciNet  Google Scholar 

  • Liu D, Kalbfleisch JD, Schaubel DE (2012) Methods for estimating center effects on recurrent events. Stat Biosci 6(1):19–37

    Article  Google Scholar 

  • Liu D, Kalbfleisch JD, Schaubel DE (2014) Methods for estimating center effects on recurrent events. Stat Biosci 6(1):19–37

    Article  Google Scholar 

  • Pepe MS, Cai J (1993) Some graphic displays and marginal regression analyses for recurrent failure times and time dependent covariates. J Am Stat Assoc 88:811–820

    Article  Google Scholar 

  • Pollard D (1990 Jan 1) Empirical processes: theory and applications. In: NSF-CBMS regional conference series in probability and statistics (pp. i-86). Institute of Mathematical Statistics and the American Statistical Association

  • Schaubel DE, Cai J (2005a) Analysis of clustered recurrent-event data with application to hospitalization rates among renal failure patients. Biostatistics 6:404–419

    Article  Google Scholar 

  • Schaubel DE, Cai J (2005b) Semiparametric methods for clustered recurrent event data. Lifetime Data Anal 11(3):405–425

    Article  MathSciNet  Google Scholar 

  • Sun J, Zhao X (2013) The statistical analysis of panel count data. Springer, New York

    Book  Google Scholar 

  • Sun L, Zhu L, Sun J (2009) Regression analysis of multivariate recurrent event data with time-varying covariate effects. J Multivar Anal 100(10):2214–23

    Article  MathSciNet  Google Scholar 

  • Wang MC, Chen YQ (2000) Nonparametric and semiparametric trend analysis of stratified recurrent time data. Biometrics 56:789–794

    Article  Google Scholar 

  • Wang Y, Yu Z (2019 Mar 25) A kernel regression model for panel count data with time-varying coefficients. ArXiv preprint arXiv:1903.10233

  • Yu Z, Liu L, Bravata DM, Williams LS, Tepper RS (2013) A semiparametric recurrent events model with time varying coefficients. Stat Med 32(6):1016–26

    Article  MathSciNet  Google Scholar 

  • Zhu L, Zhao H, Tong X, Sun J, Srivastava DK, Leisenring W, Robison LL (2013) Statistical analysis of mixed recurrent event data with application to cancer survivor study. Stat Med 32(11):1954–63

    Article  MathSciNet  Google Scholar 

  • Zhu L, Tong X, Sun J, Chen M, Srivastava DK, Leisenring W, Robison LL (2014) Regression analysis of mixed recurrent-event and panel-count data. Biostatistics 15(3):555–568

    Article  Google Scholar 

  • Zhu L, Zhao H, Sun J, Leisenring W, Robison LL (2015) Regression analysis of mixed recurrent-event and panel-count data with additive rate models. Biometrics 71(1):71–79

    Article  MathSciNet  Google Scholar 

  • Zhu L, Zhang Y, Li Y, Sun J, Robison LL (2017) A semiparametric likelihood-based method for regression analysis of mixed panel-count data. Biometrics 74:488–497

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The work was partly supported by NIH Grants (R21CA198641; R03CA219450) to Zhu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yimei Li.

Ethics declarations

Conflict of intrest

The authors have declared no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Material

Supplementary Material

In the Supplementary Material, we will sketch the proof for the asymptotic properties of the proposed estimator \(\widehat{\varvec{\beta }}\). For this, we need the following regularity conditions.

(C1) The samples \(\{ \, N_{i} (\cdot ) , O_{i} (\cdot ) , C_i , \mathbf{Z}_i ; i = 1 , \ldots , n\}\) are independent and identically distributed.

(C2) The \(N_{i}(\tau )\)’s are bounded almost surely.

(C3) Each component of \(\mathbf{Z}_i\) is bounded and \(P(C_i\ge \tau |\mathbf{Z}_i)>0\).

(C4) The data type random process \(r_i (t)\) (\(t \in [0,\tau ]\)) is conditionally independent of \(N^*_{ik}(t)\), \(H_i(t)\), and \(C_i\) given \(\mathbf{Z}_i\), and \(0<\inf _{z,\,0\le t\le \tau } P(r_i(t)=1|\mathbf{Z}_i )<\sup _{z,\,0\le t\le \tau } P(r_i(t)=1|\mathbf{Z}_i )<1\).

(C5) The matrix

$$\begin{aligned} {{\varvec{\Omega }}} = E\left\{ \sum _{k=1}^K \bigg [\int _0^{\tau }r_i(t) \big \{ \mathbf{Z}_i-\bar{\mathbf{z}}_{rk}(\beta , t)\big \}^{\otimes 2}dN_{ik}(t) +\{1-r_i(t)\} \big \{\mathbf{Z}_i-\bar{\mathbf{z}}_{pk}(\beta , t)\big \}^{\otimes 2}d\tilde{N}_{ik}(t)\bigg ] \right\} \end{aligned}$$

is positive definite, where \(\overline{\mathbf{z}}_{rk}(t)\) and \(\overline{\mathbf{z}}_{pk}(t)\) denote the limits of \(\overline{\mathbf{Z}}_{rk}(t)\) and \(\overline{\mathbf{Z}}_{pk}(t)\), respectively.

First we will show that \(\hat{\varvec{\beta }}\) is consistent and for large n there is unique solution \(\hat{{\varvec{\beta }}}\) to the equation \(U ({\varvec{\beta }})=0\). For this, consider

$$\begin{aligned} D({\varvec{\beta }})= & {} \frac{1}{n} \sum _{i=1}^n \sum _{k=1}^K \displaystyle \int _0^\tau \bigg \{ r_i(t) Y_{ik}(t) \bigg [( {\varvec{\beta }}-{\varvec{\beta }}_0)^{T}{} \mathbf{Z}_i(t) -\log \bigg \{\frac{\mathbf{S}_{rk}^{(0)} ({\varvec{\beta }};t)}{\mathbf{S}_{rk}^{(0)} ({\varvec{\beta }}_0;t)}\bigg \} \bigg ] d N_{ik}(t)\\&\quad +\,(1- r_i(t)) \varDelta _{ik}(t) \bigg [( {\varvec{\beta }}-{\varvec{\beta }}_0)^{T}{} \mathbf{Z}_i(t) -\log \bigg \{\frac{\mathbf{S}_{pk}^{(0)} ({\varvec{\beta }};t)}{\mathbf{S}_{pk}^{(0)} ({\varvec{\beta }}_0;t)}\bigg \} \bigg ] d\tilde{N}_{ik}(t)\bigg \}\,. \end{aligned}$$

Note that we have that \(\partial D({\varvec{\beta }})/\partial {\varvec{\beta }} = n^{-1}U({\varvec{\beta }})\) and \( D({\varvec{\beta }})\) converge to a concave function \(d({\varvec{\beta }})\). By the similar argument as in Lin et al. (2000) and Cheng and Wei (2000), one can easily show that \(\hat{{\varvec{\beta }}}\) converges to \({\varvec{\beta }}_0\) almost surely.

For the asymptotic normality, note that under the assumption that \(N_{ik}^*\), \(O_{ik}\) and \(C_{i}\) are mutually independent given \(\mathbf{Z}_i\), we have that

$$\begin{aligned}&E\{dN_{ik}(t)|\mathbf{Z}_i\} = E\{Y_{ik}(t)|\mathbf{Z}_i\}E\{d N^*_{ik}(t)|\mathbf{Z}_i\}\\&\quad = E\{Y_{ik}(t)|\mathbf{Z}_i\}\{e^{{{\varvec{\beta }}}_0^T \mathbf{Z}_{i}}d \mu _{0k} (t) \} = E\{Y_{ik}(t) e^{ {{\varvec{\beta }}}_0^T \mathbf{Z}_{i}} d \mu _{0k} (t)|\mathbf{Z}_i\} \, ,\\&E\{d\tilde{N}_{ik}(t)|\mathbf{Z}_i\} = E\{\varDelta _{ik}(t)|\mathbf{Z}_i\}E\{N^*_{ik}(t)|\mathbf{Z}_i\}E\{dO_{ik}(t)|\mathbf{Z}_i\}\\&\quad = E\{\varDelta _{ik}(t)|\mathbf{Z}_i\}\{e^{{{\varvec{\beta }}}_0^T \mathbf{Z}_{i}} \mu _{0k} (t) \} E\{d O_{ik}(t)|\mathbf{Z}_i\} \\&\quad = E\big \{\varDelta _{ik}(t) e^{{{\varvec{\beta }}}_0^T \mathbf{Z}_{i}} \mu _{0k} (t)\,d E \{ O_{ik}(t)\}|\mathbf{Z}_i\big \} \, . \end{aligned}$$

It follows that \(dM_{ik}^r(t)=dN_{ik}(t)-Y_{ik}(t)e^{{{\varvec{\beta }}}_0^T \mathbf{Z}_{i}}\ d \mu _{0k} (t)\) and

$$\begin{aligned} dM_{ik}^p(t)= & {} d\tilde{N}_{ik}(t)-\varDelta _{ik}(t)e^{{{\varvec{\beta }}}_0^T \mathbf{Z}_{i}}\mu _{0k} (t)d E \{ O_{ik}(t)\}\\= & {} d \tilde{N}_{ik}(t) \, - \, \varDelta _{ik} (t) \, e^{{\varvec{\beta }}_0^T \mathbf{Z}_{i} } \, d \varGamma _{0k} (t) \end{aligned}$$

are all mean-zero processes.

By the Taylor series expansion, we have that

$$\begin{aligned}&\sqrt{n}\{\widehat{{\varvec{\beta }}}-{{\varvec{\beta }}}_0\}=\widehat{\varvec{\Omega }}^{-1}({\varvec{\beta }}^{*})\frac{1}{\sqrt{n}} U({\varvec{\beta }}_0)\\&\quad =\widehat{\varvec{\Omega }}^{-1}({\varvec{\beta }}^{*})\displaystyle \frac{1}{\sqrt{n}} \displaystyle \sum _{i=1}^n \sum _{k=1}^K \displaystyle \int _0^\tau \bigg [r_i(t)\Big \{\mathbf{Z}_i-\overline{\mathbf{Z}}_{rk}(t)\Big \}dM^r_{ik}(t)\\&\qquad +\,\{1-r_i(t)\}\Big \{\mathbf{Z}_i-\overline{\mathbf{Z}}_{pk}(t)\Big \}dM^p_{ik}(t) \bigg ], \end{aligned}$$

where \({{\varvec{\beta }}}^*\) is on the line segment between \(\widehat{{\varvec{\beta }}}\) and \({\varvec{\beta }}_0\). It is clear that \(M_{ik}^r(t)\) and \(M_{ik}^p(t)\) are all the differences of two monotone functions in t. Thus based on the results of Lin et al. (2000) (Appendix A.2) or the empirical processes (Pollard 1990, page 15) and the multivariate central limit theorem, one can prove the tightness and weak convergency of \(\sqrt{n}\{\widehat{{\varvec{\beta }}}-{{\varvec{\beta }}}_0\}\) by the consistency of \(\widehat{{\varvec{\beta }}}\) and \(\hat{ \varOmega }({\varvec{\beta }}_0)\) for \({\varvec{\beta }}_0\) and \(\varvec{\Omega }\), the nonnegativeness of \(r_i(t)\), and the boundedness of \(\mathbf{Z}_i\). The asymptotic normality thus follows.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, L., Choi, S., Li, Y. et al. Statistical analysis of clustered mixed recurrent-event data with application to a cancer survivor study. Lifetime Data Anal 26, 820–832 (2020). https://doi.org/10.1007/s10985-020-09500-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-020-09500-6

Keywords

Navigation