Skip to main content

Advertisement

Log in

A general piecewise multi-state survival model: application to breast cancer

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Multi-state models are considered in the field of survival analysis for modelling illnesses that evolve through several stages over time. Multi-state models can be developed by applying several techniques, such as non-parametric, semi-parametric and stochastic processes, particularly Markov processes. When the development of an illness is being analysed, its progression is tracked periodically. Medical reviews take place at discrete times, and a panel data analysis can be formed. In this paper, a discrete-time piecewise non-homogeneous Markov process is constructed for modelling and analysing a multi-state illness with a general number of states. The model is built, and relevant measures, such as survival function, transition probabilities, mean total times spent in a group of states and the conditional probability of state change, are determined. A likelihood function is built to estimate the parameters and the general number of cut-points included in the model. Time-dependent covariates are introduced, the results are obtained in a matrix algebraic form and the algorithms are shown. The model is applied to analyse the behaviour of breast cancer. A study of the relapse and survival times of 300 breast cancer patients who have undergone mastectomy is developed. The results of this paper are implemented computationally with MATLAB and R.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The degree of freedom is given by 7 possible transitions (1 → 1, 1 → 2, 1 → 3 1 → C, 2 → 2, 2 → 3, 2 → C), 3 periods, 8 groups of patients divided by treatment regimen and 35 estimated parameters: (7 − 1) × (3 − 1) × (8 − 1) − 35 = 84 − 35 = 49.

References

  • Andersen PK, Keiding N (2001) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115

    Article  MATH  Google Scholar 

  • Bacchetti P, Boylan RD, Terrault NA, Monto A, Berenguer M (2010) Non-Markov multistate modeling using time-varying covariates, with application to progression of liver fibrosis due to hepatitis C following liver transplant. Int J Biostat 6(1):1–14

    Article  MathSciNet  Google Scholar 

  • Chen B, Yi GY, Cook RJ (2010) Analysis of interval censored disease progression data via multistate models under a non ignorable inspection process. Stat Med 29:1175–1189

    Article  MathSciNet  Google Scholar 

  • Commenges D, Joly P (2004) Multi-state model for dementia, institutionalization and death. Commun Stat A 33:1315–1326

    Article  MathSciNet  MATH  Google Scholar 

  • Cortese G, Andersen PK (2010) Competing risks and time-dependent covariates. Biom J 52(1):138–158

    MathSciNet  MATH  Google Scholar 

  • Faddy MJ (1998) On inferring the number of phases in a coxian phase-type distribution. Commun Stat Stoch Models 14(1–2):407–417

    Article  MATH  Google Scholar 

  • Farewell VT, Tom BDM (2014) The versatility of multi-state models for the analysis of longitudinal data with unobservable features. Lifetime Data Anal 20:51–75

    Article  MathSciNet  MATH  Google Scholar 

  • Hollander M, Proschan F (1979) Testing to determine the underlying distribution using randomly censored data. Biometrics 35(2):393–401

    Article  MathSciNet  MATH  Google Scholar 

  • Hougaard P (1999) Multi-state models: a review. Lifetime Data Anal 5:239–264

    Article  MathSciNet  MATH  Google Scholar 

  • Ieva F, Jackson C, Sharples LD (2015) Multi-state modelling of repeated hospitalisation and death in patients with heart failure: the use of large administrative databases in clinical epidemiology. Stat Methods Med Res. https://doi.org/10.1177/0962280215578777

    Article  Google Scholar 

  • Jackson CH (2011) Multi-state models for panel data: the msm package for R. J Stat Softw 38:1–29

    Article  Google Scholar 

  • Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multi-state Markov models for disease progression with classification error. Statistician 52:193–209

    MathSciNet  Google Scholar 

  • Kalbfleisch JD, Lawless JF (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80:863–871

    Article  MathSciNet  MATH  Google Scholar 

  • Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley series in probability and mathematical statistics. Wiley, Hoboken

    Google Scholar 

  • Meira-Machado L, de Uña-Alvarez J, Cadarso-Suarez C (2009) Multi-state models for the analysis of time-to-event data. Stat Methods Med Res 18(2):195–222

    Article  MathSciNet  Google Scholar 

  • Neuts MF (1981) Matrix-geometric solutions in stochastic models. Volume 2 of Johns Hopkins series in the mathematical sciences. Johns Hopkins University Press, Baltimore

    Google Scholar 

  • Pérez-Ocón R, Ruiz-Castro JE, Gámiz-Pérez ML (1998) A multivariate model to measure the effect of treatments in survival to breast cancer. Biom J 40(6):703–715

    Article  MATH  Google Scholar 

  • Pérez-Ocón R, Ruiz-Castro JE, Gámiz-Pérez ML (2001) Non-homogeneous Markov processes for analysing the effect of treatments to breast cancer. Stat Med 20:109–122

    Article  Google Scholar 

  • Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389–2430

    Article  MathSciNet  Google Scholar 

  • Santamaría C, García-Mora B, Rubio G, Navarro E (2009) A Markov model for analyzing the evolution of bladder carcinoma. Math Comput Model 50:726–732

    Article  MathSciNet  MATH  Google Scholar 

  • Singer JD, Willett JB (2003) Applied longitudinal data analysis. Oxford University Press, Oxford

    Book  Google Scholar 

  • Titman AC (2014) Estimating parametric semi-Markov models from panel data using phase-type approximations. Stat Comput 24:155–164

    Article  MathSciNet  MATH  Google Scholar 

  • Titman AC, Sharples LD (2010) Model diagnostics for multi-state models. Stat Methods Med Res 19(6):621–651. https://doi.org/10.1177/0962280209105541

    Article  MathSciNet  Google Scholar 

  • Van De Hout A (2016) Multi-state survival models for interval-censored data. CRC Press, Boca Raton

    Google Scholar 

Download references

Acknowledgements

Funding was provided by Ministerio de Economía y Competitividad (Grant No. FQM-307), European Regional Development Fund (ERDF) (Grant No. MTM2017-88708-P), University of Milano-Bicocca (Grant No. 2014-ATE-0228).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariangela Zenga.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

The parameters of the model are estimated by a maximum likelihood function. These parameters are the matrices Tu (or parameters inside these matrices), the regression covariate vectors βu, for u = 1,…, k and the cut-points, all of them estimated jointly. We assume that n items are observed, all beginning in state 1, and item i is observed at mi change times, the last time being death or censorship. Given that the item is observed at change times, then for any item, the value of the covariate vector and the corresponding state is observed. Therefore, a sequence of times, states and values of the covariate vector is achieved for each item i: \(0 = t_{i,1} < t_{i,2} < \cdots < t_{{i,m_{i} }}\), \(1 = x_{1}^{i} , \ldots , \, x_{{m_{i} }}^{i}\) and \({\mathbf{z}}_{{l_{1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{{m_{i} }} }}^{i}\), respectively. \({\mathbf{z}}_{{l_{s} }}^{i}\) corresponds to the covariate vector for the interval that contains the time \(t_{i,s}\) for item i and for \(s = 1, \ldots ,m_{i}\).

We assume k − 1 unknown positive integer cut-points, c0 = 0 < c1 < ··· < ck−1 < ck = ∞. The likelihood function for estimating the parameters is given by

$$L\left( {c_{1} , \ldots ,c_{k - 1} ,{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right) = \prod\limits_{i = 1}^{n} {\prod\limits_{s = 2}^{{m_{i} }} {h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right)} } .$$

For the calculations, we define the intervals \(I_{q} = \left[ {c_{q - 1} ,c_{q} } \right[;J_{q} = \left] {c_{q - 1} ,c_{q} } \right] ,\, \, j = 1, \ldots ,k\). Let \(f_{x}^{q} \left( {t,{\mathbf{z}}_{q}^{i} ;{\mathbf{T}}_{q} ,{\varvec{\upbeta}}^{q} } \right)\) be the sojourn time probability in state x at time t calculated by using the matrix \({\mathbf{P}}_{q} \left( {{\mathbf{z}}_{q}^{i} } \right)\). Given that the state at any cut-point is known, then the factors in the likelihood function have the following expressions,

  1. 1.

    If ti,s−1 and ti,s belong to intervals Ij and Jj, respectively,

    $$h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = f_{{x_{s - 1}^{i} }}^{j} \left( {t_{i,s} - t_{i,s - 1} - 1,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right)T_{{x_{s - 1}^{i} x_{s}^{i} }}^{j} \left( {{\mathbf{z}}_{j}^{i} } \right) .$$
  2. 2.

    If ti,s−1 and ti,s belong to interval Ij−1, Jj, respectively,

    $$\begin{aligned} h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = j - 1,j} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = & f_{{x_{s - 1}^{i} }}^{j - 1} \left( {c_{j - 1} - t_{i,s - 1} ,{\mathbf{z}}_{j - 1}^{i} ;{\mathbf{T}}_{j - 1} ,{\varvec{\upbeta}}^{j - 1} } \right) \\ & \quad \times f_{{x_{s - 1}^{i} }}^{j} \left( {t_{i,s} - c_{j - 1} - 1,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right)T_{{x_{s - 1}^{i} ,x_{s}^{i} }}^{j} \left( {{\mathbf{z}}_{j}^{i} } \right). \\ \end{aligned}$$
  3. 3.

    If \(t_{i,s - 1} \in I_{j} \;{\text{and}}\;t_{i,s} \in J_{q} \;{\text{with}}\;q - j \ge 2\),

    $$\begin{aligned} h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = j, \ldots ,q} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = & f_{{x_{s - 1}^{i} }}^{j} \left( {c_{j} - t_{i,s - 1} ,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right) \\ & \quad \times \prod\limits_{u = j + 1}^{q - 1} {f_{{x_{s - 1}^{i} }}^{u} \left( {c_{u} - c_{u - 1} ,{\mathbf{z}}_{u}^{i} ;{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} } \right)} f_{{x_{s - 1}^{i} }}^{q} \left( {t_{i,s} - c_{q} - 1,{\mathbf{z}}_{q}^{i} ;{\mathbf{T}}_{q} ,{\varvec{\upbeta}}^{q} } \right)T_{{x_{s - 1}^{i} ,x_{s}^{i} }}^{q} \left( {{\mathbf{z}}_{q}^{i} } \right). \\ \end{aligned}$$

The likelihood function is maximized by considering several restrictions. The matrices \({\mathbf{P}}_{q}\) and \({\mathbf{P}}_{q} \left( {{\mathbf{z}}_{q}^{i} } \right)\) associated with the model should be stochastic matrices for any covariate vector \({\mathbf{z}}_{q}^{i}\). This restriction will not allow probabilities less than zero or greater than one for any values of the parameters.

Then, the cut-points are estimated, and the optimum values \(c_{1} , \ldots ,c_{k - 1}\) are the values that verify

$$c_{1} , \ldots ,c_{k - 1} \in {\rm N}\,{\text{such}}\,{\text{that}}\,L\left( {c_{1} , \ldots ,c_{k - 1} ,{\hat{\mathbf{T}}}_{u}^{{c_{1} , \ldots ,c_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{c_{1} , \ldots ,c_{k - 1} }} ,u = 1, \ldots ,k} \right) = \mathop {\hbox{max} }\limits_{{v_{j} }} \left\{ {L\left( {v_{1} , \ldots ,v_{k - 1} ,{\hat{\mathbf{T}}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,u = 1, \ldots ,k} \right)} \right\} ,$$

subject to \(0 < v_{j} < v_{j + 1} \,{\text{for}}\, \, j = 1, \ldots ,k - 2\) and \(v_{k - 1} < \mathop {\hbox{max} }\limits_{i} \left\{ {t_{{i,m_{i} }} } \right\}\), where vj belongs to the set of natural numbers for any j with the corresponding restrictions. \(\left( {{\hat{\mathbf{T}}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,u = 1, \ldots ,k} \right)\) are the maximum likelihood estimates of \(\left( {{\mathbf{T}}^{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right)\) for \(\nu_{1} , \ldots ,\nu_{k - 1}\).

The likelihood function has been implemented computationally with Matlab and it is maximized by using the function fmincon of this programme. This function is used to find the minimum of a constrained nonlinear multivariable function by using the interior-point algorithm.

Appendix B

See Tables 11 and 12.

Table 11 Contingency table of observed and expected counts for the homogeneous model
Table 12 Contingency table of observed and expected counts for the piecewise model

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruiz-Castro, J.E., Zenga, M. A general piecewise multi-state survival model: application to breast cancer. Stat Methods Appl 29, 813–843 (2020). https://doi.org/10.1007/s10260-019-00505-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-019-00505-6

Keywords

Navigation