Skip to main content
Log in

What price semiparametric Cox regression?

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Cox’s proportional hazards regression model is the standard method for modelling censored life-time data with covariates. In its standard form, this method relies on a semiparametric proportional hazards structure, leaving the baseline unspecified. Naturally, specifying a parametric model also for the baseline hazard, leading to fully parametric Cox models, will be more efficient when the parametric model is correct, or close to correct. The aim of this paper is two-fold. (a) We compare parametric and semiparametric models in terms of their asymptotic relative efficiencies when estimating different quantities. We find that for some quantities the gain of restricting the model space is substantial, while it is negligible for others. (b) To deal with such selection in practice we develop certain focused and averaged focused information criteria (FIC and AFIC). These aim at selecting the most appropriate proportional hazards models for given purposes. Our methodology applies also to the simpler case without covariates, when comparing Kaplan–Meier and Nelson–Aalen estimators to parametric counterparts. Applications to real data are also provided, along with analyses of theoretical behavioural aspects of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Slightly adjusted estimators not influencing the theory may typically be applied when there are tied events, see e.g. Aalen et al. (2008, Ch. 3.1.3).

  2. If \(\gamma \) influences the censoring mechanism and covariate distribution, then (11) is only a ‘partial’ likelihood, and not a true one. This has no consequences for inference, however.

  3. We have avoided introducing the notation of Hadamard differentiability tangentially to a subset of \(\mathbb {D}\), as such are better stated explicitly in our concrete cases.

References

  • Aalen OO, Gjessing HK (2001) Understanding the shape of the hazard rate: a process point of view [with discussion and a rejoinder]. Stat Sci 16:1–22

    MATH  Google Scholar 

  • Aalen OO, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, Berlin

    Book  MATH  Google Scholar 

  • Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, Berlin

    Book  MATH  Google Scholar 

  • Borgan Ø (1984) Maximum likelihood estimation in parametric counting process models, with applications to censored failure time data. Scand J Stat 11:1–16

    MathSciNet  MATH  Google Scholar 

  • Breslow NE (1972) Contribution to the discussion of the paper by D.R. Cox. J R Stat Soc Ser B 34:216–217

    MathSciNet  Google Scholar 

  • Claeskens G, Hjort NL (2003) The focused information criterion [with discussion and a rejoinder]. J Am Stat Assoc 98:900–916

    Article  MATH  Google Scholar 

  • Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Cox DR (1972) Regression models and life-tables [with discussion and a rejoinder]. J R Stat Soc Ser B 34:187–220

    Google Scholar 

  • Efron B (1977) The efficiency of Cox’s likelihood function for censored data. J Am Stat Assoc 72:557–565

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort NL (1985) Bootstrapping Cox’s regression model. Department of Statistics, University of Stanford, Tech. rep

  • Hjort NL (1990) Goodness of fit tests in models for life history data based on cumulative hazard rates. Ann Stat 18:1221–1258

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort NL (1992) On inference in parametric survival data models. Int Stat Rev 60:355–387

    Article  MATH  Google Scholar 

  • Hjort NL (2008) Focused information criteria for the linear hazard regression model. In: Vonta F, Nikulin M, Limnios N, Huber-Carol C (eds) Statistical models and methods for biomedical and technical systems. Birkhäuser, Boston, pp 487–502

    Chapter  Google Scholar 

  • Hjort NL, Claeskens G (2003) Frequentist model average estimators [with discussion and a rejoinder]. J Am Stat Assoc 98:879–899

    Article  MATH  Google Scholar 

  • Hjort NL, Claeskens G (2006) Focused information criteria and model averaging for the Cox hazard regression model. J Am Stat Assoc 101:1449–1464

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort NL, Pollard DB (1993) Asymptotics for minimisers of convex processes. Department of Mathematics, University of Oslo, Tech. rep

  • Jeong JH, Oakes D (2003) On the asymptotic relative efficiency of estimates from Cox’s model. Sankhya 65:422–439

    MathSciNet  MATH  Google Scholar 

  • Jeong JH, Oakes D (2005) Effects of different hazard ratios on asymptotic relative efficiency estimates from Cox’s model. Commun Stat Theory Methods 34:429–448

    Article  MathSciNet  MATH  Google Scholar 

  • Jullum M, Hjort NL (2017) Parametric or nonparametric: The FIC approach. Stat Sin 27:951–981

    MathSciNet  MATH  Google Scholar 

  • Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Meier P, Karrison T, Chappell R, Xie H (2004) The price of Kaplan-Meier. J Am Stat Assoc 99:890–896

    Article  MathSciNet  MATH  Google Scholar 

  • Miller R (1983) What price Kaplan-Meier? Biometrics 39:1077–1081

    Article  MathSciNet  MATH  Google Scholar 

  • Oakes D (1977) The asymptotic information in censored survival data. Biometrika 64:441–448

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart A (2000) Asymptotic statistics. Cambridge University Press, Cambridge

    Google Scholar 

Download references

Acknowledgements

Our efforts have been supported in part by the Norwegian Research Council, through the project FocuStat (Focus Driven Statistical Inference With Complex Data) and the research based innovation centre Statistics for Innovation (sfi)\(^2\). We are also grateful to the reviewers and editor Mei-Ling T. Lee for constructive comments which led to an improved presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Jullum.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 203 KB)

Appendices

Appendix

Estimating variances and covariances

For FIC and AFIC applications we need not only the focus parameter estimators \({{\widehat{\mu }}}_\mathrm{cox}\) and \({{\widehat{\mu }}}_\mathrm{pm}\) themselves (yielding also \({\widehat{b}}={{\widehat{\mu }}}_\mathrm{pm}-{{\widehat{\mu }}}_\mathrm{cox}\)), but also (consistent) recipes for estimating the quantities \(v_\mathrm{cox}\), \(v_c\), \(v_\mathrm{pm}\), making up the covariance matrix \(\Sigma _\mu \) in (31). The main ingredient in \(\Sigma _\mu \) is indeed \(\Sigma (s,t)\), with blocks as in (27), consisting of the quantities

$$\begin{aligned} \sigma ^2(t), \quad F(t), \quad J_\mathrm{cox}, \quad J, \quad K, \quad \nu (t),\quad \mathrm{and} \quad G. \end{aligned}$$
(39)

In this appendix we provide explicit consistent estimators for these quantities, in addition to a simple consistent estimation strategy for other quantities typically involved in \(\Sigma _\mu \).

The principle we essentially follow is to insert the empirical analogues of all unknown quantities. This amounts firstly to estimating \(\beta _\mathrm{true}\), \(\beta _0\), \(\theta _0\), \(A_\mathrm{true}(\cdot )\), by respectively \({\widehat{\beta }}_\mathrm{cox}\), \({\widehat{\beta }}_\mathrm{pm}\), \(\widehat{\theta }\), \({\widehat{A}}_\mathrm{cox}(\cdot )\). Secondly, \(r^{(k)}(s;h(\beta _\mathrm{true},\beta _0))\) is estimated by \(n^{-1}R^{(k)}_n(s;h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm}))\) for \(k=0,1,2\), and h some simple continuous function combining \(\beta \) and \(\beta _0\). For f some vector function involving unknown quantities, integrals of the form \(\int _0^t f\alpha _\mathrm{true}\,\mathrm{d}s=\int _0^t f\, \mathrm{d}A_\mathrm{true}\) are then estimated by \(\int _0^t {\widehat{f}}\,\mathrm{d}{\widehat{A}}_\mathrm{cox}= \sum _{T_i \le t} {\widehat{f}}(T_i)D_i/R^{(0)}_n(T_i;{\widehat{\beta }}_\mathrm{cox})\). Note also that integrals \(\int _0^t f(s)r^{(k)}(s;h(\beta _\mathrm{true},\beta _0))\,\mathrm{d}s\) are estimated by \(n^{-1}\int _0^t {\widehat{f}}(s) R_n^{(k)}(s;h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm}))\, \mathrm{d}s\), which may be expressed as the sum

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \int _0^{\min (T_i,t)}{\widehat{f}}(s)\,\mathrm{d}s\right\} R_{(i)}^{(k)}(h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm})), \end{aligned}$$
(40)

where \(R_{(i)}^{(k)}(h(\cdot ))=R_{(i)}^{(k)}(0;h(\cdot ))\) is equal to respectively \(\exp \{X_i^{\mathrm{t}}h(\cdot )\},X_i \exp \{X_i^{\mathrm{t}}h(\cdot )\}\), and \(X_i X_i^{\mathrm{t}}\exp \{X_i^{\mathrm{t}}h(\cdot )\}\) for \(k=0,1,2\). Thus, estimators of the form \(\int _0^t f(s)g^{(k)}(s;\beta )\, \mathrm{d}s\) may be expressed by

$$\begin{aligned}&\frac{1}{n}\sum _{T_i \le t} {\widehat{f}}(T_i)D_i\frac{R_n^{(k)}(T_i;{\widehat{\beta }}_\mathrm{cox}+ {\widehat{\beta }})}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})}\nonumber \\&\quad -\frac{1}{n}\sum _{i=1}^n\left\{ \int _0^{\min (T_i,t)}{\widehat{f}}(s)\alpha _{\mathrm{pm}} (s;\widehat{\theta })\,\mathrm{d}s\right\} R_{(i)}^{(k)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}), \end{aligned}$$
(41)

with \({\widehat{\beta }}\) inserted to estimate \(\beta \). The f-function is sometimes partly estimated by a step-function, like when f(s) is equal to either \(A(s)f_1(s), \sigma ^2(\min (s,t))f_1(s)\) or \(F(s)f_1(s)\) for some function \(f_1\). In such cases, integrals like \(\int _0^{t} f(s)r^{(k)}(s;h(\beta ,\beta _0))\,\mathrm{d}s\) are decomposed even further. To see this, assume \(f(s)=f_0(s)f_1(s)\) is estimated by \({\widehat{f}}(s)={\widehat{f}}_0(s){\widehat{f}}_1(s)\) where \({\widehat{f}}_0(s)\) is a step function of the form \({\widehat{f}}_0(s)=\sum _{j=1}^n \text {step}_{j} \mathbf {1}_{\{ T_j \le s \}}=\sum _{j:T_j\le s} \text {step}_j\). Then (40) decomposes further into the ‘triangle sum’

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{j:T_j< \min (T_i,t)} \text {step}_{j} \left\{ \int _{T_j}^{\min (T_i,t)}{\widehat{f}}_1(s)\,\mathrm{d}s \right\} R_{(i)}^{(k)}(h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm})). \end{aligned}$$

As a consequence, also \(\int _0^t f(s)g^{(k)}(s;\beta )\, \mathrm{d}s\) decomposes further, such that the subtrahend in (41) equals

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{j:T_j< \min (T_i,t)} \text {step}_{j} \left\{ \int _{T_j}^{\min (T_i,t)}{\widehat{f}}_1(s) \alpha _{\mathrm{pm}}(s;\widehat{\theta })\,\mathrm{d}s \right\} R_{(i)}^{(k)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}). \end{aligned}$$

Let us now turn to the actual estimation of the quantities in (39).

  1. [1]

    First, consider \(\sigma ^2(t)\) as given in (9). The estimation strategy outlined above gives the estimator

    $$\begin{aligned} {\widehat{\sigma }}^2(t)=\int _0^t \frac{\mathrm{d}{{\widehat{A}}}_\mathrm{cox}(s)}{ n^{-1}R_n^{(0)}(s;{{\widehat{\beta }}}_\mathrm{cox})} =\sum _{T_i\le t}\frac{nD_i}{ \{R^{(0)}_n(T_i,{{\widehat{\beta }}}_\mathrm{cox})\}^2}. \end{aligned}$$
  2. [2]

    Next consider F(t) as given in (9). Writing \(E_n(s;\beta )\) for \(R_n^{(1)}(s;\beta )/R_n^{(0)}(s;\beta )\), this function is similarly estimated by

    $$\begin{aligned} {\widehat{F}}(t)=\int _0^t E_n(T_i;{\widehat{\beta }}_\mathrm{cox})\,\mathrm{d}{{\widehat{A}}}_\mathrm{cox}(s) =\sum _{T_i\le t} \frac{D_i E_n(T_i;{\widehat{\beta }}_\mathrm{cox})}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})}. \end{aligned}$$
  3. [3]

    Consider now \(J_\mathrm{cox}\) as given in (7). Following the plug-in procedure, we get

    $$\begin{aligned} \widehat{J}_\mathrm{cox}=\frac{1}{n} \sum _{T_i \le \tau } \left\{ \frac{R^{(2)}_n(T_i;{\widehat{\beta }}_\mathrm{cox})}{R^{(0)}_n(T_i; {\widehat{\beta }}_\mathrm{cox})} - E_n(T_i;{\widehat{\beta }}_\mathrm{cox})E_n(T_i;{\widehat{\beta }}_\mathrm{cox})^{\mathrm{t}}\right\} D_i. \end{aligned}$$

    Alternatively, \(J_\mathrm{cox}\) may be estimated by \(n^{-1}\) times minus the Hessian matrix of log-partial likelihood in (4).

  4. [4]

    Consider J as given in (14) with blocks as in (16). Following the plug-in procedure, we estimate J by \(\widehat{J}\) having blocks

    $$\begin{aligned} \widehat{J}_{11}&= \frac{1}{n}\sum _{i=1}^nR_{(i)}^{(0)}({\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} \{\psi (s;\widehat{\theta })\psi (s;\widehat{\theta })^{\mathrm{t}} +\psi ^\mathrm{d}(s;\widehat{\theta })\}\alpha _{\mathrm{pm}}(s;\widehat{\theta })\, \mathrm{d}s\\&\quad -\frac{1}{n} \sum _{i=1}^n\psi ^\mathrm{d}(T_i;\widehat{\theta })D_i , \\ \widehat{J}_{12}&= \widehat{J}_{21}^{\mathrm{t}} = \frac{1}{n}\sum _{i=1}^n\int _0^{T_i} \psi (s;\widehat{\theta })\alpha _{\mathrm{pm}}(s;\widehat{\theta }) \, \mathrm{d}s R_{(i)}^{(1)}({\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}\\&= \frac{1}{n}\sum _{i=1}^nA^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })R_{(i)}^{(1)} ({\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}, \\ \widehat{J}_{22}&= \frac{1}{n}\sum _{i=1}^nR_{(i)}^{(2)} ({\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} \alpha _{\mathrm{pm}} (s;\widehat{\theta }) \, \mathrm{d}s =\frac{1}{n}\sum _{i=1}^nR_{(i)}^{(2)} ({\widehat{\beta }}_\mathrm{pm}) A_{\mathrm{pm}}(T_i;\widehat{\theta }). \end{aligned}$$

    Similarly to \(J_\mathrm{cox}\), J may be estimated by \(n^{-1}\) times minus the Hessian of the parametric log-likelihood in (11).

  5. [5]

    We continue with K as given in (14). The plug-in procedure applied to the formulae in (17) results in K being estimated by \(\widehat{K}\) having blocks

    $$\begin{aligned} \widehat{K}_{11}&= \frac{1}{n}\sum _{i=1}^n\bigg [ \psi (T_i;\widehat{\theta })\psi (T_i;\widehat{\theta })^{\mathrm{t}}\\&\quad - \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })\psi (T_i;\widehat{\theta })^{\mathrm{t}} + \psi (T_i;\widehat{\theta })A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })^{\mathrm{t}}\} \frac{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})}\bigg ]D_i\\&\quad + \frac{1}{n}\sum _{i=1}^nR_{(i)}^{(0)}(2{\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} [A^{\mathrm{d}}_\mathrm{pm}(s;\widehat{\theta })\psi (s;\widehat{\theta })^{\mathrm{t}}\\&\quad + \psi (s;\widehat{\theta })A^{\mathrm{d}}_\mathrm{pm}(s;\widehat{\theta })^{\mathrm{t}}] \alpha _{\mathrm{pm}}(s;\widehat{\theta })\, \mathrm{d}s,\\ \widehat{K}_{12}&=\widehat{K}_{21}^{\mathrm{t}}= \frac{1}{n}\sum _{i=1}^n\bigg [\psi (T_i;\widehat{\theta }) E_n(T_i;{\widehat{\beta }}_\mathrm{cox})^{\mathrm{t}}\\&\quad - \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })+\psi (T_i;\widehat{\theta }) A_\mathrm{pm}(T_i;\widehat{\theta })\}\frac{R_n^{(1)}(T_i; {\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}}{R_n^{(0)} (T_i;{\widehat{\beta }}_\mathrm{cox})} \bigg ]D_i \\&\quad + \frac{1}{n}\sum _{i=1}^n\left[ \int _0^{T_i} \{A^{\mathrm{d}}_\mathrm{pm}(s;\widehat{\theta })+\psi (s; \widehat{\theta })A_\mathrm{pm}(s;\widehat{\theta })\} \alpha _{\mathrm{pm}} (s;\widehat{\theta })\, \mathrm{d}s \right] R_{(i)}^{(1)} (2{\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}, \\ \widehat{K}_{22}&=\frac{1}{n}\sum _{i=1}^n\frac{R_n^{(2)}(T_i; {\widehat{\beta }}_{\mathrm{cox}})-2R_n^{(2)}(T_i;{\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})A_\mathrm{pm}(T_i;\widehat{\theta })}{R_n^{(0)}(T_i;{\widehat{\beta }}_{\mathrm{cox}})} D_i \\&+ \quad \frac{2}{n} \sum _{i=1}^nR_{(i)}^{(2)}(2{\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} \alpha _{\mathrm{pm}}(s;\widehat{\theta })A_\mathrm{pm}(s; \widehat{\theta })\, \mathrm{d}s. \end{aligned}$$
  6. [6]

    We go on to the covariance \(\nu (t)=\mathrm{Cov}(W(t),U^{\mathrm{t}})\) as given in (29). This covariance formula may be estimated by

    $$\begin{aligned} \widehat{\nu }(t)&= \begin{pmatrix} \sum _{T_i \le t} D_i\psi (T_i;\widehat{\theta })/R_n^{(0)} (T_i;{\widehat{\beta }}_\mathrm{cox}) \\ \widehat{F}(t) \end{pmatrix}^{\mathrm{t}}\\&\quad - \frac{1}{n}\sum _{i=1}^n\frac{D_i \widehat{\sigma }^2(\min (T_i,t))}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} R_n^{(0)}(T_i;2{\widehat{\beta }}_\mathrm{cox}) \psi (T_i;\widehat{\theta }) \\ R_n^{(1)}(T_i;2{\widehat{\beta }}_\mathrm{cox}) \end{pmatrix}^{\mathrm{t}} \\&\quad + \sum _{i=1}^n\sum _{j:T_j < \min (T_i,t)} \frac{D_j}{R^{(0)}_n(T_j;{\widehat{\beta }}_\mathrm{cox})^2} \begin{pmatrix} R_{(i)}^{(0)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}_\mathrm{cox}) \{A^{\mathrm{d}}_{\mathrm{pm}}(T_i;\widehat{\theta })-A^{\mathrm{d}}_\mathrm{pm}(T_j;\widehat{\theta })\} \\ R_{(i)}^{(1)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}_\mathrm{cox}) \{A_\mathrm{pm}(T_i;\widehat{\theta })-A_\mathrm{pm}(T_j;\widehat{\theta })\} \end{pmatrix}^{\mathrm{t}}. \end{aligned}$$
  7. [7]

    Finally, we estimate the covariance \(G=\mathrm{Cov}(U_\mathrm{cox},U^{\mathrm{t}})\) as given in (28). We use

    $$\begin{aligned} \widehat{G}&= - \frac{1}{n}\sum _{i=1}^n\frac{D_i}{R^{(0)}_n(T_i;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} \psi (T_i;\widehat{\theta })\{{\widehat{A}}_\mathrm{cox}(T_i) R^{(1)}_n(T_i;2{\widehat{\beta }}_\mathrm{cox})^{\mathrm{t}} - R^{(0)}_n(T_i; 2{\widehat{\beta }}_\mathrm{cox})\widehat{F}(T_i)^{\mathrm{t}}\} \\ {\widehat{A}}_ \mathrm{cox}(T_i)R^{(2)}_n(T_i;2{\widehat{\beta }}_\mathrm{cox}) - R^{(1)}_n(T_i; 2{\widehat{\beta }}_\mathrm{cox})\widehat{F}(T_i)^{\mathrm{t}} \end{pmatrix}^{\mathrm{t}} \\&\quad - \frac{1}{n} \sum _{i=1}^n\sum _{j:T_j \le T_i} \frac{D_j E_n(T_j;{\widehat{\beta }}_\mathrm{cox})}{R_n^{(0)}(T_j;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} R^{(0)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm}) \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta }) - A^{\mathrm{d}}_\mathrm{pm}(T_j;\widehat{\theta })\} \\ R^{(1)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm}) \{A_\mathrm{pm}(T_i;\widehat{\theta })-A_\mathrm{pm}(T_j;\widehat{\theta })\} \end{pmatrix}^{\mathrm{t}} \\&\quad + \frac{1}{n} \sum _{i=1}^n\sum _{j:T_j \le T_i} \frac{D_j}{R^{(0)}_n(T_j;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta }) - A^{\mathrm{d}}_\mathrm{pm}(T_j;\widehat{\theta })\} R^{(1)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}} \\ \{A_\mathrm{pm}(T_i;\widehat{\theta })-A_\mathrm{pm}(T_j;\widehat{\theta })\} R^{(2)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm}) \end{pmatrix}^{\mathrm{t}} \\&\quad + \begin{pmatrix} 0_{p \times q} \\ \widehat{J}_{\mathrm{cox}} \end{pmatrix}^{\mathrm{t}}. \end{aligned}$$

Relying strictly on the plug-in principle has the beneficial property that all estimators are consistent. This follows from the continuous mapping theorem since the precise formulae for the quantities in (39) are all seen to be continuous in the quantities and functions (in their appropriate spaces) for which we employ the plug-in principle.

To arrive at consistent estimators for \(v_\mathrm{cox}, v_c\) and \(v_\mathrm{pm}\) for the classes of focus parameters we have investigated, one typically needs consistent estimators also for the quantities: \(m'_\mathrm{pm}, m'_\mathrm{cox}, z'_\mathrm{pm}, z'_\mathrm{cox}\), \(\zeta _\mathrm{pm}(\cdot )\), \(\zeta _\mathrm{cox}(\cdot ), V_{t,\mathrm{pm}}(\cdot ), V_{t,\mathrm{cox}}(\cdot ),h_\mathrm{pm}(\phi _\mathrm{pm})\) and \(h_\mathrm{cox}(\phi _\mathrm{cox})\), as described in Section 4.1. All except the last of these are continuous when viewed as functions of the unknown quantities \(\theta _0, \beta _0, \beta _\mathrm{true}\) and \(A_\mathrm{true}(\cdot )\). These are therefore estimated consistently by plugging in empirical analogues, like above. The last quantity \(h_\mathrm{cox}(\phi _\mathrm{cox})=\alpha _\mathrm{true}(\phi _\mathrm{cox})\exp (x^{\mathrm{t}}\beta _\mathrm{true})\), with \(\phi _\mathrm{cox}=A^{-1}_\mathrm{true}(-\log (1-u)/\exp (x^{\mathrm{t}}\beta _\mathrm{true}))\) involved in estimation of a quantile (see Sect. 3 in the supplementary material (Jullum and Hjort, this work)), is more delicate as we need the estimator to be smooth or at least nonzero. The troublesome part is estimation of \(\alpha _\mathrm{true}\) at the unknown position \(\phi _\mathrm{cox}\). This position is estimated by \({\widehat{\phi }}_\mathrm{cox}={\widehat{A}}^{-1}_\mathrm{cox}(-\log (1-u)/\exp (x^{\mathrm{t}}{\widehat{\beta }}_\mathrm{cox}))\), while a smooth estimate of \(\alpha _\mathrm{true}\) is obtained e.g. via a kernel estimator \({\widehat{\alpha }}_\mathrm{cox}(t) = \int h^{-1} K^\circ ((t-s)/h) \, \mathrm{d}{\widehat{A}}_\mathrm{cox}(s)\) for some suitable kernel \(K^\circ \) and bandwidth \(h=h_n\), which then is evaluated in \({\widehat{\phi }}_\mathrm{cox}\). As long as the bandwidth has the property that \(h_n \rightarrow 0\), \(n h_n \rightarrow \infty \), and \(\alpha _\mathrm{true}\) is positive and two times differentiable in a neighborhood of \(\phi _\mathrm{cox}\), this strategy also yields a consistent estimator. Thus, replacing the quantities in the various forms of \(v_\mathrm{cox}\), \(v_c\), \(v_\mathrm{pm}\) towards the end of Section 4.1, by the estimators presented in this appendix, yields consistent estimators \({\widehat{v}}_\mathrm{cox}\), \({\widehat{v}}_c\), \({\widehat{v}}_\mathrm{pm}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jullum, M., Hjort, N.L. What price semiparametric Cox regression?. Lifetime Data Anal 25, 406–438 (2019). https://doi.org/10.1007/s10985-018-9450-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-018-9450-7

Keywords

Navigation