Abstract
Multi-state models are considered in the field of survival analysis for modelling illnesses that evolve through several stages over time. Multi-state models can be developed by applying several techniques, such as non-parametric, semi-parametric and stochastic processes, particularly Markov processes. When the development of an illness is being analysed, its progression is tracked periodically. Medical reviews take place at discrete times, and a panel data analysis can be formed. In this paper, a discrete-time piecewise non-homogeneous Markov process is constructed for modelling and analysing a multi-state illness with a general number of states. The model is built, and relevant measures, such as survival function, transition probabilities, mean total times spent in a group of states and the conditional probability of state change, are determined. A likelihood function is built to estimate the parameters and the general number of cut-points included in the model. Time-dependent covariates are introduced, the results are obtained in a matrix algebraic form and the algorithms are shown. The model is applied to analyse the behaviour of breast cancer. A study of the relapse and survival times of 300 breast cancer patients who have undergone mastectomy is developed. The results of this paper are implemented computationally with MATLAB and R.
Similar content being viewed by others
Notes
The degree of freedom is given by 7 possible transitions (1 → 1, 1 → 2, 1 → 3 1 → C, 2 → 2, 2 → 3, 2 → C), 3 periods, 8 groups of patients divided by treatment regimen and 35 estimated parameters: (7 − 1) × (3 − 1) × (8 − 1) − 35 = 84 − 35 = 49.
References
Andersen PK, Keiding N (2001) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115
Bacchetti P, Boylan RD, Terrault NA, Monto A, Berenguer M (2010) Non-Markov multistate modeling using time-varying covariates, with application to progression of liver fibrosis due to hepatitis C following liver transplant. Int J Biostat 6(1):1–14
Chen B, Yi GY, Cook RJ (2010) Analysis of interval censored disease progression data via multistate models under a non ignorable inspection process. Stat Med 29:1175–1189
Commenges D, Joly P (2004) Multi-state model for dementia, institutionalization and death. Commun Stat A 33:1315–1326
Cortese G, Andersen PK (2010) Competing risks and time-dependent covariates. Biom J 52(1):138–158
Faddy MJ (1998) On inferring the number of phases in a coxian phase-type distribution. Commun Stat Stoch Models 14(1–2):407–417
Farewell VT, Tom BDM (2014) The versatility of multi-state models for the analysis of longitudinal data with unobservable features. Lifetime Data Anal 20:51–75
Hollander M, Proschan F (1979) Testing to determine the underlying distribution using randomly censored data. Biometrics 35(2):393–401
Hougaard P (1999) Multi-state models: a review. Lifetime Data Anal 5:239–264
Ieva F, Jackson C, Sharples LD (2015) Multi-state modelling of repeated hospitalisation and death in patients with heart failure: the use of large administrative databases in clinical epidemiology. Stat Methods Med Res. https://doi.org/10.1177/0962280215578777
Jackson CH (2011) Multi-state models for panel data: the msm package for R. J Stat Softw 38:1–29
Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multi-state Markov models for disease progression with classification error. Statistician 52:193–209
Kalbfleisch JD, Lawless JF (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80:863–871
Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley series in probability and mathematical statistics. Wiley, Hoboken
Meira-Machado L, de Uña-Alvarez J, Cadarso-Suarez C (2009) Multi-state models for the analysis of time-to-event data. Stat Methods Med Res 18(2):195–222
Neuts MF (1981) Matrix-geometric solutions in stochastic models. Volume 2 of Johns Hopkins series in the mathematical sciences. Johns Hopkins University Press, Baltimore
Pérez-Ocón R, Ruiz-Castro JE, Gámiz-Pérez ML (1998) A multivariate model to measure the effect of treatments in survival to breast cancer. Biom J 40(6):703–715
Pérez-Ocón R, Ruiz-Castro JE, Gámiz-Pérez ML (2001) Non-homogeneous Markov processes for analysing the effect of treatments to breast cancer. Stat Med 20:109–122
Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389–2430
Santamaría C, García-Mora B, Rubio G, Navarro E (2009) A Markov model for analyzing the evolution of bladder carcinoma. Math Comput Model 50:726–732
Singer JD, Willett JB (2003) Applied longitudinal data analysis. Oxford University Press, Oxford
Titman AC (2014) Estimating parametric semi-Markov models from panel data using phase-type approximations. Stat Comput 24:155–164
Titman AC, Sharples LD (2010) Model diagnostics for multi-state models. Stat Methods Med Res 19(6):621–651. https://doi.org/10.1177/0962280209105541
Van De Hout A (2016) Multi-state survival models for interval-censored data. CRC Press, Boca Raton
Acknowledgements
Funding was provided by Ministerio de Economía y Competitividad (Grant No. FQM-307), European Regional Development Fund (ERDF) (Grant No. MTM2017-88708-P), University of Milano-Bicocca (Grant No. 2014-ATE-0228).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
The parameters of the model are estimated by a maximum likelihood function. These parameters are the matrices Tu (or parameters inside these matrices), the regression covariate vectors βu, for u = 1,…, k and the cut-points, all of them estimated jointly. We assume that n items are observed, all beginning in state 1, and item i is observed at mi change times, the last time being death or censorship. Given that the item is observed at change times, then for any item, the value of the covariate vector and the corresponding state is observed. Therefore, a sequence of times, states and values of the covariate vector is achieved for each item i: \(0 = t_{i,1} < t_{i,2} < \cdots < t_{{i,m_{i} }}\), \(1 = x_{1}^{i} , \ldots , \, x_{{m_{i} }}^{i}\) and \({\mathbf{z}}_{{l_{1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{{m_{i} }} }}^{i}\), respectively. \({\mathbf{z}}_{{l_{s} }}^{i}\) corresponds to the covariate vector for the interval that contains the time \(t_{i,s}\) for item i and for \(s = 1, \ldots ,m_{i}\).
We assume k − 1 unknown positive integer cut-points, c0 = 0 < c1 < ··· < ck−1 < ck = ∞. The likelihood function for estimating the parameters is given by
For the calculations, we define the intervals \(I_{q} = \left[ {c_{q - 1} ,c_{q} } \right[;J_{q} = \left] {c_{q - 1} ,c_{q} } \right] ,\, \, j = 1, \ldots ,k\). Let \(f_{x}^{q} \left( {t,{\mathbf{z}}_{q}^{i} ;{\mathbf{T}}_{q} ,{\varvec{\upbeta}}^{q} } \right)\) be the sojourn time probability in state x at time t calculated by using the matrix \({\mathbf{P}}_{q} \left( {{\mathbf{z}}_{q}^{i} } \right)\). Given that the state at any cut-point is known, then the factors in the likelihood function have the following expressions,
-
1.
If ti,s−1 and ti,s belong to intervals Ij and Jj, respectively,
$$h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = f_{{x_{s - 1}^{i} }}^{j} \left( {t_{i,s} - t_{i,s - 1} - 1,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right)T_{{x_{s - 1}^{i} x_{s}^{i} }}^{j} \left( {{\mathbf{z}}_{j}^{i} } \right) .$$ -
2.
If ti,s−1 and ti,s belong to interval Ij−1, Jj, respectively,
$$\begin{aligned} h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = j - 1,j} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = & f_{{x_{s - 1}^{i} }}^{j - 1} \left( {c_{j - 1} - t_{i,s - 1} ,{\mathbf{z}}_{j - 1}^{i} ;{\mathbf{T}}_{j - 1} ,{\varvec{\upbeta}}^{j - 1} } \right) \\ & \quad \times f_{{x_{s - 1}^{i} }}^{j} \left( {t_{i,s} - c_{j - 1} - 1,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right)T_{{x_{s - 1}^{i} ,x_{s}^{i} }}^{j} \left( {{\mathbf{z}}_{j}^{i} } \right). \\ \end{aligned}$$ -
3.
If \(t_{i,s - 1} \in I_{j} \;{\text{and}}\;t_{i,s} \in J_{q} \;{\text{with}}\;q - j \ge 2\),
$$\begin{aligned} h_{{x_{s - 1}^{i} ,x_{s}^{i} }} \left( {\left. {{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} ,u = j, \ldots ,q} \right|t_{i,s - 1} ,t_{i,s} ,{\mathbf{z}}_{{l_{s - 1} }}^{i} , \ldots ,{\mathbf{z}}_{{l_{s} }}^{i} } \right) = & f_{{x_{s - 1}^{i} }}^{j} \left( {c_{j} - t_{i,s - 1} ,{\mathbf{z}}_{j}^{i} ;{\mathbf{T}}_{j} ,{\varvec{\upbeta}}^{j} } \right) \\ & \quad \times \prod\limits_{u = j + 1}^{q - 1} {f_{{x_{s - 1}^{i} }}^{u} \left( {c_{u} - c_{u - 1} ,{\mathbf{z}}_{u}^{i} ;{\mathbf{T}}_{u} ,{\varvec{\upbeta}}^{u} } \right)} f_{{x_{s - 1}^{i} }}^{q} \left( {t_{i,s} - c_{q} - 1,{\mathbf{z}}_{q}^{i} ;{\mathbf{T}}_{q} ,{\varvec{\upbeta}}^{q} } \right)T_{{x_{s - 1}^{i} ,x_{s}^{i} }}^{q} \left( {{\mathbf{z}}_{q}^{i} } \right). \\ \end{aligned}$$
The likelihood function is maximized by considering several restrictions. The matrices \({\mathbf{P}}_{q}\) and \({\mathbf{P}}_{q} \left( {{\mathbf{z}}_{q}^{i} } \right)\) associated with the model should be stochastic matrices for any covariate vector \({\mathbf{z}}_{q}^{i}\). This restriction will not allow probabilities less than zero or greater than one for any values of the parameters.
Then, the cut-points are estimated, and the optimum values \(c_{1} , \ldots ,c_{k - 1}\) are the values that verify
subject to \(0 < v_{j} < v_{j + 1} \,{\text{for}}\, \, j = 1, \ldots ,k - 2\) and \(v_{k - 1} < \mathop {\hbox{max} }\limits_{i} \left\{ {t_{{i,m_{i} }} } \right\}\), where vj belongs to the set of natural numbers for any j with the corresponding restrictions. \(\left( {{\hat{\mathbf{T}}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,{\hat{\mathbf{\beta }}}_{u}^{{v_{1} , \ldots ,v_{k - 1} }} ,u = 1, \ldots ,k} \right)\) are the maximum likelihood estimates of \(\left( {{\mathbf{T}}^{u} ,{\varvec{\upbeta}}^{u} ,u = 1, \ldots ,k} \right)\) for \(\nu_{1} , \ldots ,\nu_{k - 1}\).
The likelihood function has been implemented computationally with Matlab and it is maximized by using the function fmincon of this programme. This function is used to find the minimum of a constrained nonlinear multivariable function by using the interior-point algorithm.
Appendix B
Rights and permissions
About this article
Cite this article
Ruiz-Castro, J.E., Zenga, M. A general piecewise multi-state survival model: application to breast cancer. Stat Methods Appl 29, 813–843 (2020). https://doi.org/10.1007/s10260-019-00505-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-019-00505-6