A nested copula duration model for competing risks with multiple spells

https://doi.org/10.1016/j.csda.2020.106986Get rights and content

Abstract

A copula graphic estimator for the competing risks duration model with multiple spells is presented. By adopting a nested copula structure the dependencies between risks and spells are modelled separately. This breaks up an implicit restriction of popular duration models such as multivariate mixed proportional hazards. It is shown that the dependence structure between spells is identifiable and can be estimated, in contrast to the dependence structure between competing risks. Thus, by allowing these two components to differ, the model is not identifiable. This is an important finding related to the general identifiability of competing risks models. Various features of the model are investigated by simulations and its practicality is illustrated by an application to unemployment duration data.

Introduction

Duration models are important for the analysis of data in many disciplines, including biostatistics, mechanical engineering and social sciences. As a running example we consider the duration of unemployment in a model with competing risks and multiple spells. Competing risks mean unemployment is terminated by different exit types such as starting a job (risk 1) or withdrawal from the labour market (risk 2). Multiple spells or repeated occurrences correspond to having more than one unemployment period per unemployed person in the data. We suggest a new copula model that permits for flexible modelling of dependencies between competing risks and multiple spells.

In a competing risks model, observables are not the latent competing durations of different exit types, but only the minimum of them and the cause of risk. In our running example, if an unemployed was observed to withdraw from the labour market, the latent unemployment duration terminated by finding a new job is incidentally censored and is unobserved. As a consequence, the joint distribution of the risk-specific latent durations as well as the respective marginal distributions are not identifiable (Cox, 1962, Tsiatis, 1975). Only the exit type and the cause-specific sub-distributions, which are the distributions of the minimum of the latent durations, are identifiable. In our running example, these are the distributions of the unemployment durations that are terminated by risk 1 and risk 2 respectively.

Copulas are increasingly popular in statistics to model the joint distribution of multiple outcomes (e.g. Kole et al., 2007, Joe, 2015, Oh and Patton, 2017, Klein and Kneib, 2016, Klein et al., 2019). The advantage of using a copula function for modelling the joint distribution is that it separates the dependence structure from the marginal distributions, so that the copula function can be modelled separately from the marginals. Zheng and Klein (1995) show that the latent marginals in the competing risks model can be nonparametrically identified by establishing the relationship between the latent marginals and the cause-specific sub-distributions provided that the copula function is known. It is called the copula graphic estimator, which is compatible with a large variety of copula classes, including Archimedean, hierarchical Archimedean, vine copula, and factor copula models. For a review of different types of copula functions, see Joe (2015). The wide range of copula models provides flexibilities in modelling high-dimensional multivariate joint distributions in terms of the number of parameters, features of tail dependence (e.g. different degrees of dependence between extreme events), and different degrees of asymmetries (e.g. the dependence between two particular risks can be different from the dependence between another pair of risks). The Archimedean copula is mainly used for the copula graphic estimator, though, as it permits for flexible dependence structures, while also taking into account the ease of implementation. Specifically, Rivest and Wells (2001) obtain a closed form relationship between the latent marginals and the cause-specific sub-distributions if the copula belonged to the Archimedean class. Similar linkages between the cause specific sub-distributions and the latent marginal distributions have been established in terms of hazard functions for Archimedean copula (Emura et al., 2019). Despite its computational tractability, the Archimedean class is rich enough to model different types of tail asymmetries, e.g. the Clayton copula exhibits lower tail dependence, the Gumbel copula exhibits upper tail dependence, while the Joe–Clayton copula allows for different degrees of lower and upper tail dependence. At the same time it is parsimonious enough to be summarised by one to two parameters.

Popular alternatives to the copula graphic estimator include the assumption of independent risks (Edin, 1989, Mealli and Pudney, 1996, Lindstrom and Lauster, 2001, Burda et al., 2015, Braun et al., 2020) and an assumed multivariate joint normal distribution (Malevergne and Sornette, 2003, D’Addio and Rosholm, 2005, Van den Goorbergh et al., 2005; Pennington-Cross, 2010; Smith and Vahey, 2016, Agnello et al., 2019). These alternatives and the copula model share the common limitation that the choice of dependence structure is typically not guided by existing theoretical research. In our illustrative example regarding unemployment duration particularly, we are not aware of any economic theory that provides prior hypotheses on the choice of copula. Nevertheless, as a result of the non-identifiability aspect of the competing risks model, a certain degree of assumptions on the dependence structure must be made and these assumptions cannot be tested for. In this regard, the copula graphic estimator provides more flexibility than the alternatives as it allows the researcher to work with different copulas to fit the data. If the copula function were unknown, one could also conduct sensitivity analysis for the choices of the copula function. For instance, Lo and Wilke (2014) perform a series of Monte Carlo simulation studies and observe that estimation results are more sensitive to the choices of copula parameter(s) and less variant with respect to the choices of the Archimedean copula function.

The use of multiple spells data to address the non-identifiably of the competing risks model has also gained popularity in practice as the existence of several observations from one unit contributes additional information to the model. Honoré (1993) shows that the mixed proportional hazard model (MPHM), as a specific class of duration models, is identified under much weaker restrictions in presence of multiple spells than if there were only one spell per unit. This finding has since then triggered an extensive economic literature for multiple spells data (for example Horowitz and Lee, 2004, Kalwij, 2010, Arranz and Canto, 2012, Carrasco and Garcia-Perez, 2015). The key component to model dependencies between risks and between spells in the MPHM is a random frailty term. Frailty corresponds to some omitted variables that play a role for the latent durations. In our running example, the frailty term could be the degree of risk-aversion of an unemployed person, which is usually unobserved. More risk-adverse individuals tend to have shorter unemployment duration due to a sooner start of a new job in the trained occupation or in a different profession. More risk-adverse individuals also tend to have faster job-taking times for different unemployment spells. In this regard, the unobserved risk aversion determines the (positive) dependency between risks (types of jobs) as well as between spells. It is therefore an implicit assumption in the MPHM that the risks and the spells share the same dependence structure. Even though it is difficult to establish a 1–1 link between the mixed proportional hazard model and the copula based duration model, Lo et al. (2017) establish a linkage between the frailty term in the MPHM and the copula function by showing that the single spell MPHM (Han and Hausman, 1990) is a special case of the copula model. Other contributions that consider copula duration models and frailty are Ha et al. (2019) and Wang et al. (2020). We reason in this paper that the frailty in multiple-spell MPH model is also related to a multiple-spell variant of the copula model.

We contribute to this literature by considering a nested copula model that accounts for (possibly different) dependencies between the competing risks and between multiple spells. This relaxes the implicit assumption of the MPHM that the risks and the spells share the same dependency structure (Honoré, 1993, Horny and Picchio, 2010), which can be important in some applications. In the context of our running example, unobservable frailty could also be the motivation of the unemployed. It is intuitive that the unobserved motivation causes a positive dependency between spells, as a more motivated unemployed person tends to have shorter unemployment duration for all her unemployment spells. However, whether the unobserved motivation causes a positive or a negative dependency between risks is unclear. For instance, higher levels of motivation may speed up time to employment but may slow down the time to drop out of the labour force. This means, motivation may have opposite effects on the risks, which translate into positive or negative risk dependency. So, it is more difficult to anticipate the direction of risk dependency. And more importantly there is no prior reason why the dependency across risks should be the same as the dependency across multiple spells. This is the motivating point of our approach.

This paper contributes as follows. First, it adopts a nested Archimedean copula framework (compare Hofert and Mächler, 2011, Hofert, 2012) to model risk and multiple spells dependencies. Second, it is shown that only the dependency between multiple spells is identifiable and can be estimated. Similar to the single spell models, the competing risks dependence structure is unidentifiable and the presence of multiple spells does not change this. Third, we present a multiple-stage procedure to simulate data and conduct a simulation study to investigate the performance of the proposed model and to assess how misspecification of various model components lead to estimation bias in quantities of interest. It is shown that estimated risk-specific latent marginals and the partial effects of covariates on these marginals critically depend on the assumed competing risks dependence structure. The MPHM that assumes identical dependence structures between risks and spells produces only a snapshot and ignores the variety of possible result patterns.

The paper is structured as follows. Section 2 introduces the model. Section 3 presents the results of a simulation study and Section 4 contains an application to seasonal unemployment duration data. Section 5 provides additional remarks and ideas for extensions.

Section snippets

The model

We restrict the model to two competing risks r=1,2. An extension to more than two risks is straightforward by applying risk pooling (Lo and Wilke, 2014). For the k’th spell with k=1,,K there is one pair of latent durations for risks 1 and 2. Let Trk be the latent duration for risk r in the kth spell. The number of spells is random with Pr(K2)>0 and let k be the realised number of spells. Given that K=k for some k1 we observe k covariates x1,,xk in Rp. Regressors xk are allowed to vary

Simulation study

We conduct a comprehensive simulation study to investigate the features of various variants of the model of Section 2. Existing simulation designs for multiple spells models typically only consider the cause specific (e.g. Sankaran and Anisha, 2011) or sub-distributions (e.g. Huang et al., 2017) with dependencies that come from parametric frailty distributions. Simulating data for our suggested model is complicated as it requires known cumulative incidences and marginals under an assumed nested

Analysis of multiple spells unemployment

In this section we put the model to real data. The empirical analysis of unemployment duration is of great societal importance which is documented by a large amount of academic publications. While most analysis relies on single spell data, a number of contributions have used multiple spells data (for example Rød and Westlie, 2012, Kalwij, 2010, Carrasco and Garcia-Perez, 2015). It is therefore interesting to see what insights the suggested models of Section 2 produce.

In our application we

Robustness and extensions

In many applications there is little prior knowledge about model components such as dependence structure and functional form of the cumulative incidences. While the latter can be empirically tested against some alternative without knowing the dependence structure, it is a difficult exercise to assume the copula related parts of the likelihood. The assumed copula structure in model (1) will be therefore likely not correct. Although the lack of intuition regarding the assumption could be

References (45)

  • CarrascoR. et al.

    Employment dynamics of immigrants versus natives: Evidence from the boom bust period in Spain, 2000-2011

    Econ. Inq.

    (2015)
  • CarrièreJ.F.

    Dependent Decrement Theory

    Trans. Soc. Actuar.

    (1994)
  • CarrièreJ.F.

    Removing cancer when it is correlated with other causes of death

    Biom. J.

    (1995)
  • Cox, D.R., 1962. Renewal Theory,...
  • D’AddioA. et al.

    Exits from temporary jobs in europe: a competing risks analysis. Exits from temporary jobs in Europe: A competing risks analysis

    Labour Econ.

    (2005)
  • DornerM. et al.

    The sample of integrated labour market biographies

    Schmollers Jahrb.

    (2010)
  • EdinP.

    Unemployment duration and competing risks: Evidence from Sweden

    Scand. J. Econ.

    (1989)
  • EmuraT. et al.

    Comparison of the marginal hazard model and the subdistribution hazard model for competing risks under an assumed copula

    Stat. Methods Med. Res.

    (2019)
  • HaI.D. et al.

    Profile likelihood approaches for semiparametric copula and frailty models for clustered survival data

    J. Appl. Stat.

    (2019)
  • HanA.K. et al.

    Flexible parametric estimation of duration and competing risk models

    J. Appl. Econometrics

    (1990)
  • HofertM.

    A stochastic representation and sampling algorithm for nested Archimedean copulas

    J. Stat. Comput. Simul.

    (2012)
  • HofertM. et al.

    Nested Archimedean copulas meet R: The nacopula package

    J. Stat. Softw.

    (2011)
  • Cited by (11)

    • Bayesian reliability analysis for copula based step-stress partially accelerated dependent competing risks model

      2022, Reliability Engineering and System Safety
      Citation Excerpt :

      This case is called competing risks and the competing risks data consist of the failure time and the failure cause [5,15]. For the competing risks model under use condition, the parametric estimation methods have been frequently studied in competing risks literature [16–18], and the nonparametric reliability estimation has been also discussed [19]. For reliability analysis in partially accelerated competing risks model (PACRM), the independence among competing risks is often assumed.

    • Copula link-based additive models for bivariate time-to-event outcomes with general censoring scheme

      2022, Computational Statistics and Data Analysis
      Citation Excerpt :

      Some of them are based on the frailty technique (e.g., Chen et al., 2009, 2014; Martins et al., 2019; Wen and Chen, 2013; Wang et al., 2015; Zhou et al., 2017; Zeng et al., 2017). Others, based on copula and hence more relevant to this paper, are Barthel et al. (2018), Cook and Tolusso (2009), Hu et al. (2017), Kwon et al. (2021), Lo et al. (2020), Marra and Radice (2020), Romeo et al. (2018), Sujica and Van Keilegom (2018), Sun and Ding (2021a) and Wang et al. (2008). These works are not as general and versatile as our proposal.

    View all citing articles on Scopus
    View full text