What price semiparametric Cox regression?

Jullum, Martin; Hjort, Nils Lid

doi:10.1007/s10985-018-9450-7

What price semiparametric Cox regression?

Published: 14 September 2018

Volume 25, pages 406–438, (2019)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

885 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Cox’s proportional hazards regression model is the standard method for modelling censored life-time data with covariates. In its standard form, this method relies on a semiparametric proportional hazards structure, leaving the baseline unspecified. Naturally, specifying a parametric model also for the baseline hazard, leading to fully parametric Cox models, will be more efficient when the parametric model is correct, or close to correct. The aim of this paper is two-fold. (a) We compare parametric and semiparametric models in terms of their asymptotic relative efficiencies when estimating different quantities. We find that for some quantities the gain of restricting the model space is substantial, while it is negligible for others. (b) To deal with such selection in practice we develop certain focused and averaged focused information criteria (FIC and AFIC). These aim at selecting the most appropriate proportional hazards models for given purposes. Our methodology applies also to the simpler case without covariates, when comparing Kaplan–Meier and Nelson–Aalen estimators to parametric counterparts. Applications to real data are also provided, along with analyses of theoretical behavioural aspects of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Principles of confounder selection

Article Open access 06 March 2019

Tyler J. VanderWeele

Notes

Slightly adjusted estimators not influencing the theory may typically be applied when there are tied events, see e.g. Aalen et al. (2008, Ch. 3.1.3).
If $\gamma $ influences the censoring mechanism and covariate distribution, then (11) is only a ‘partial’ likelihood, and not a true one. This has no consequences for inference, however.
We have avoided introducing the notation of Hadamard differentiability tangentially to a subset of $\mathbb {D}$, as such are better stated explicitly in our concrete cases.

References

Aalen OO, Gjessing HK (2001) Understanding the shape of the hazard rate: a process point of view [with discussion and a rejoinder]. Stat Sci 16:1–22
MATH Google Scholar
Aalen OO, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, Berlin
Book MATH Google Scholar
Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, Berlin
Book MATH Google Scholar
Borgan Ø (1984) Maximum likelihood estimation in parametric counting process models, with applications to censored failure time data. Scand J Stat 11:1–16
MathSciNet MATH Google Scholar
Breslow NE (1972) Contribution to the discussion of the paper by D.R. Cox. J R Stat Soc Ser B 34:216–217
MathSciNet Google Scholar
Claeskens G, Hjort NL (2003) The focused information criterion [with discussion and a rejoinder]. J Am Stat Assoc 98:900–916
Article MATH Google Scholar
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge
Book MATH Google Scholar
Cox DR (1972) Regression models and life-tables [with discussion and a rejoinder]. J R Stat Soc Ser B 34:187–220
Google Scholar
Efron B (1977) The efficiency of Cox’s likelihood function for censored data. J Am Stat Assoc 72:557–565
Article MathSciNet MATH Google Scholar
Hjort NL (1985) Bootstrapping Cox’s regression model. Department of Statistics, University of Stanford, Tech. rep
Hjort NL (1990) Goodness of fit tests in models for life history data based on cumulative hazard rates. Ann Stat 18:1221–1258
Article MathSciNet MATH Google Scholar
Hjort NL (1992) On inference in parametric survival data models. Int Stat Rev 60:355–387
Article MATH Google Scholar
Hjort NL (2008) Focused information criteria for the linear hazard regression model. In: Vonta F, Nikulin M, Limnios N, Huber-Carol C (eds) Statistical models and methods for biomedical and technical systems. Birkhäuser, Boston, pp 487–502
Chapter Google Scholar
Hjort NL, Claeskens G (2003) Frequentist model average estimators [with discussion and a rejoinder]. J Am Stat Assoc 98:879–899
Article MATH Google Scholar
Hjort NL, Claeskens G (2006) Focused information criteria and model averaging for the Cox hazard regression model. J Am Stat Assoc 101:1449–1464
Article MathSciNet MATH Google Scholar
Hjort NL, Pollard DB (1993) Asymptotics for minimisers of convex processes. Department of Mathematics, University of Oslo, Tech. rep
Jeong JH, Oakes D (2003) On the asymptotic relative efficiency of estimates from Cox’s model. Sankhya 65:422–439
MathSciNet MATH Google Scholar
Jeong JH, Oakes D (2005) Effects of different hazard ratios on asymptotic relative efficiency estimates from Cox’s model. Commun Stat Theory Methods 34:429–448
Article MathSciNet MATH Google Scholar
Jullum M, Hjort NL (2017) Parametric or nonparametric: The FIC approach. Stat Sin 27:951–981
MathSciNet MATH Google Scholar
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Book MATH Google Scholar
Meier P, Karrison T, Chappell R, Xie H (2004) The price of Kaplan-Meier. J Am Stat Assoc 99:890–896
Article MathSciNet MATH Google Scholar
Miller R (1983) What price Kaplan-Meier? Biometrics 39:1077–1081
Article MathSciNet MATH Google Scholar
Oakes D (1977) The asymptotic information in censored survival data. Biometrika 64:441–448
Article MathSciNet MATH Google Scholar
van der Vaart A (2000) Asymptotic statistics. Cambridge University Press, Cambridge
Google Scholar

Download references

Acknowledgements

Our efforts have been supported in part by the Norwegian Research Council, through the project FocuStat (Focus Driven Statistical Inference With Complex Data) and the research based innovation centre Statistics for Innovation (sfi)$^2$. We are also grateful to the reviewers and editor Mei-Ling T. Lee for constructive comments which led to an improved presentation.

Author information

Authors and Affiliations

Department of Mathematics, University of Oslo, Oslo, Norway
Martin Jullum & Nils Lid Hjort

Authors

Martin Jullum
View author publications
You can also search for this author in PubMed Google Scholar
Nils Lid Hjort
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Jullum.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 203 KB)

Appendices

Appendix

Estimating variances and covariances

For FIC and AFIC applications we need not only the focus parameter estimators ${{\widehat{\mu }}}_\mathrm{cox}$ and ${{\widehat{\mu }}}_\mathrm{pm}$ themselves (yielding also ${\widehat{b}}={{\widehat{\mu }}}_\mathrm{pm}-{{\widehat{\mu }}}_\mathrm{cox}$), but also (consistent) recipes for estimating the quantities $v_\mathrm{cox}$, $v_c$, $v_\mathrm{pm}$, making up the covariance matrix $\Sigma _\mu $ in (31). The main ingredient in $\Sigma _\mu $ is indeed $\Sigma (s,t)$, with blocks as in (27), consisting of the quantities

$$\begin{aligned} \sigma ^2(t), \quad F(t), \quad J_\mathrm{cox}, \quad J, \quad K, \quad \nu (t),\quad \mathrm{and} \quad G. \end{aligned}$$

(39)

In this appendix we provide explicit consistent estimators for these quantities, in addition to a simple consistent estimation strategy for other quantities typically involved in $\Sigma _\mu $.

The principle we essentially follow is to insert the empirical analogues of all unknown quantities. This amounts firstly to estimating $\beta _\mathrm{true}$, $\beta _0$, $\theta _0$, $A_\mathrm{true}(\cdot )$, by respectively ${\widehat{\beta }}_\mathrm{cox}$, ${\widehat{\beta }}_\mathrm{pm}$, $\widehat{\theta }$, ${\widehat{A}}_\mathrm{cox}(\cdot )$. Secondly, $r^{(k)}(s;h(\beta _\mathrm{true},\beta _0))$ is estimated by $n^{-1}R^{(k)}_n(s;h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm}))$ for $k=0,1,2$, and h some simple continuous function combining $\beta $ and $\beta _0$. For f some vector function involving unknown quantities, integrals of the form $\int _0^t f\alpha _\mathrm{true}\,\mathrm{d}s=\int _0^t f\, \mathrm{d}A_\mathrm{true}$ are then estimated by $\int _0^t {\widehat{f}}\,\mathrm{d}{\widehat{A}}_\mathrm{cox}= \sum _{T_i \le t} {\widehat{f}}(T_i)D_i/R^{(0)}_n(T_i;{\widehat{\beta }}_\mathrm{cox})$. Note also that integrals $\int _0^t f(s)r^{(k)}(s;h(\beta _\mathrm{true},\beta _0))\,\mathrm{d}s$ are estimated by $n^{-1}\int _0^t {\widehat{f}}(s) R_n^{(k)}(s;h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm}))\, \mathrm{d}s$, which may be expressed as the sum

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \int _0^{\min (T_i,t)}{\widehat{f}}(s)\,\mathrm{d}s\right\} R_{(i)}^{(k)}(h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm})), \end{aligned}$$

(40)

where $R_{(i)}^{(k)}(h(\cdot ))=R_{(i)}^{(k)}(0;h(\cdot ))$ is equal to respectively $\exp \{X_i^{\mathrm{t}}h(\cdot )\},X_i \exp \{X_i^{\mathrm{t}}h(\cdot )\}$, and $X_i X_i^{\mathrm{t}}\exp \{X_i^{\mathrm{t}}h(\cdot )\}$ for $k=0,1,2$. Thus, estimators of the form $\int _0^t f(s)g^{(k)}(s;\beta )\, \mathrm{d}s$ may be expressed by

$$\begin{aligned}&\frac{1}{n}\sum _{T_i \le t} {\widehat{f}}(T_i)D_i\frac{R_n^{(k)}(T_i;{\widehat{\beta }}_\mathrm{cox}+ {\widehat{\beta }})}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})}\nonumber \\&\quad -\frac{1}{n}\sum _{i=1}^n\left\{ \int _0^{\min (T_i,t)}{\widehat{f}}(s)\alpha _{\mathrm{pm}} (s;\widehat{\theta })\,\mathrm{d}s\right\} R_{(i)}^{(k)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}), \end{aligned}$$

(41)

with ${\widehat{\beta }}$ inserted to estimate $\beta $. The f-function is sometimes partly estimated by a step-function, like when f(s) is equal to either $A(s)f_1(s), \sigma ^2(\min (s,t))f_1(s)$ or $F(s)f_1(s)$ for some function $f_1$. In such cases, integrals like $\int _0^{t} f(s)r^{(k)}(s;h(\beta ,\beta _0))\,\mathrm{d}s$ are decomposed even further. To see this, assume $f(s)=f_0(s)f_1(s)$ is estimated by ${\widehat{f}}(s)={\widehat{f}}_0(s){\widehat{f}}_1(s)$ where ${\widehat{f}}_0(s)$ is a step function of the form ${\widehat{f}}_0(s)=\sum _{j=1}^n \text {step}_{j} \mathbf {1}_{\{ T_j \le s \}}=\sum _{j:T_j\le s} \text {step}_j$. Then (40) decomposes further into the ‘triangle sum’

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{j:T_j< \min (T_i,t)} \text {step}_{j} \left\{ \int _{T_j}^{\min (T_i,t)}{\widehat{f}}_1(s)\,\mathrm{d}s \right\} R_{(i)}^{(k)}(h({\widehat{\beta }}_\mathrm{cox},{\widehat{\beta }}_\mathrm{pm})). \end{aligned}$$

As a consequence, also $\int _0^t f(s)g^{(k)}(s;\beta )\, \mathrm{d}s$ decomposes further, such that the subtrahend in (41) equals

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\sum _{j:T_j< \min (T_i,t)} \text {step}_{j} \left\{ \int _{T_j}^{\min (T_i,t)}{\widehat{f}}_1(s) \alpha _{\mathrm{pm}}(s;\widehat{\theta })\,\mathrm{d}s \right\} R_{(i)}^{(k)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}). \end{aligned}$$

Let us now turn to the actual estimation of the quantities in (39).

[1]
First, consider $\sigma ^2(t)$ as given in (9). The estimation strategy outlined above gives the estimator
$$\begin{aligned} {\widehat{\sigma }}^2(t)=\int _0^t \frac{\mathrm{d}{{\widehat{A}}}_\mathrm{cox}(s)}{ n^{-1}R_n^{(0)}(s;{{\widehat{\beta }}}_\mathrm{cox})} =\sum _{T_i\le t}\frac{nD_i}{ \{R^{(0)}_n(T_i,{{\widehat{\beta }}}_\mathrm{cox})\}^2}. \end{aligned}$$
[2]
Next consider F(t) as given in (9). Writing $E_n(s;\beta )$ for $R_n^{(1)}(s;\beta )/R_n^{(0)}(s;\beta )$, this function is similarly estimated by
$$\begin{aligned} {\widehat{F}}(t)=\int _0^t E_n(T_i;{\widehat{\beta }}_\mathrm{cox})\,\mathrm{d}{{\widehat{A}}}_\mathrm{cox}(s) =\sum _{T_i\le t} \frac{D_i E_n(T_i;{\widehat{\beta }}_\mathrm{cox})}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})}. \end{aligned}$$
[3]
Consider now $J_\mathrm{cox}$ as given in (7). Following the plug-in procedure, we get
$$\begin{aligned} \widehat{J}_\mathrm{cox}=\frac{1}{n} \sum _{T_i \le \tau } \left\{ \frac{R^{(2)}_n(T_i;{\widehat{\beta }}_\mathrm{cox})}{R^{(0)}_n(T_i; {\widehat{\beta }}_\mathrm{cox})} - E_n(T_i;{\widehat{\beta }}_\mathrm{cox})E_n(T_i;{\widehat{\beta }}_\mathrm{cox})^{\mathrm{t}}\right\} D_i. \end{aligned}$$
Alternatively, $J_\mathrm{cox}$ may be estimated by $n^{-1}$ times minus the Hessian matrix of log-partial likelihood in (4).
[4]
Consider J as given in (14) with blocks as in (16). Following the plug-in procedure, we estimate J by $\widehat{J}$ having blocks
$$\begin{aligned} \widehat{J}_{11}&= \frac{1}{n}\sum _{i=1}^nR_{(i)}^{(0)}({\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} \{\psi (s;\widehat{\theta })\psi (s;\widehat{\theta })^{\mathrm{t}} +\psi ^\mathrm{d}(s;\widehat{\theta })\}\alpha _{\mathrm{pm}}(s;\widehat{\theta })\, \mathrm{d}s\\&\quad -\frac{1}{n} \sum _{i=1}^n\psi ^\mathrm{d}(T_i;\widehat{\theta })D_i , \\ \widehat{J}_{12}&= \widehat{J}_{21}^{\mathrm{t}} = \frac{1}{n}\sum _{i=1}^n\int _0^{T_i} \psi (s;\widehat{\theta })\alpha _{\mathrm{pm}}(s;\widehat{\theta }) \, \mathrm{d}s R_{(i)}^{(1)}({\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}\\&= \frac{1}{n}\sum _{i=1}^nA^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })R_{(i)}^{(1)} ({\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}, \\ \widehat{J}_{22}&= \frac{1}{n}\sum _{i=1}^nR_{(i)}^{(2)} ({\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} \alpha _{\mathrm{pm}} (s;\widehat{\theta }) \, \mathrm{d}s =\frac{1}{n}\sum _{i=1}^nR_{(i)}^{(2)} ({\widehat{\beta }}_\mathrm{pm}) A_{\mathrm{pm}}(T_i;\widehat{\theta }). \end{aligned}$$
Similarly to $J_\mathrm{cox}$, J may be estimated by $n^{-1}$ times minus the Hessian of the parametric log-likelihood in (11).
[5]
We continue with K as given in (14). The plug-in procedure applied to the formulae in (17) results in K being estimated by $\widehat{K}$ having blocks
$$\begin{aligned} \widehat{K}_{11}&= \frac{1}{n}\sum _{i=1}^n\bigg [ \psi (T_i;\widehat{\theta })\psi (T_i;\widehat{\theta })^{\mathrm{t}}\\&\quad - \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })\psi (T_i;\widehat{\theta })^{\mathrm{t}} + \psi (T_i;\widehat{\theta })A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })^{\mathrm{t}}\} \frac{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})}\bigg ]D_i\\&\quad + \frac{1}{n}\sum _{i=1}^nR_{(i)}^{(0)}(2{\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} [A^{\mathrm{d}}_\mathrm{pm}(s;\widehat{\theta })\psi (s;\widehat{\theta })^{\mathrm{t}}\\&\quad + \psi (s;\widehat{\theta })A^{\mathrm{d}}_\mathrm{pm}(s;\widehat{\theta })^{\mathrm{t}}] \alpha _{\mathrm{pm}}(s;\widehat{\theta })\, \mathrm{d}s,\\ \widehat{K}_{12}&=\widehat{K}_{21}^{\mathrm{t}}= \frac{1}{n}\sum _{i=1}^n\bigg [\psi (T_i;\widehat{\theta }) E_n(T_i;{\widehat{\beta }}_\mathrm{cox})^{\mathrm{t}}\\&\quad - \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta })+\psi (T_i;\widehat{\theta }) A_\mathrm{pm}(T_i;\widehat{\theta })\}\frac{R_n^{(1)}(T_i; {\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}}{R_n^{(0)} (T_i;{\widehat{\beta }}_\mathrm{cox})} \bigg ]D_i \\&\quad + \frac{1}{n}\sum _{i=1}^n\left[ \int _0^{T_i} \{A^{\mathrm{d}}_\mathrm{pm}(s;\widehat{\theta })+\psi (s; \widehat{\theta })A_\mathrm{pm}(s;\widehat{\theta })\} \alpha _{\mathrm{pm}} (s;\widehat{\theta })\, \mathrm{d}s \right] R_{(i)}^{(1)} (2{\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}}, \\ \widehat{K}_{22}&=\frac{1}{n}\sum _{i=1}^n\frac{R_n^{(2)}(T_i; {\widehat{\beta }}_{\mathrm{cox}})-2R_n^{(2)}(T_i;{\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})A_\mathrm{pm}(T_i;\widehat{\theta })}{R_n^{(0)}(T_i;{\widehat{\beta }}_{\mathrm{cox}})} D_i \\&+ \quad \frac{2}{n} \sum _{i=1}^nR_{(i)}^{(2)}(2{\widehat{\beta }}_\mathrm{pm}) \int _0^{T_i} \alpha _{\mathrm{pm}}(s;\widehat{\theta })A_\mathrm{pm}(s; \widehat{\theta })\, \mathrm{d}s. \end{aligned}$$
[6]
We go on to the covariance $\nu (t)=\mathrm{Cov}(W(t),U^{\mathrm{t}})$ as given in (29). This covariance formula may be estimated by
$$\begin{aligned} \widehat{\nu }(t)&= \begin{pmatrix} \sum _{T_i \le t} D_i\psi (T_i;\widehat{\theta })/R_n^{(0)} (T_i;{\widehat{\beta }}_\mathrm{cox}) \\ \widehat{F}(t) \end{pmatrix}^{\mathrm{t}}\\&\quad - \frac{1}{n}\sum _{i=1}^n\frac{D_i \widehat{\sigma }^2(\min (T_i,t))}{R_n^{(0)}(T_i;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} R_n^{(0)}(T_i;2{\widehat{\beta }}_\mathrm{cox}) \psi (T_i;\widehat{\theta }) \\ R_n^{(1)}(T_i;2{\widehat{\beta }}_\mathrm{cox}) \end{pmatrix}^{\mathrm{t}} \\&\quad + \sum _{i=1}^n\sum _{j:T_j < \min (T_i,t)} \frac{D_j}{R^{(0)}_n(T_j;{\widehat{\beta }}_\mathrm{cox})^2} \begin{pmatrix} R_{(i)}^{(0)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}_\mathrm{cox}) \{A^{\mathrm{d}}_{\mathrm{pm}}(T_i;\widehat{\theta })-A^{\mathrm{d}}_\mathrm{pm}(T_j;\widehat{\theta })\} \\ R_{(i)}^{(1)}({\widehat{\beta }}_\mathrm{pm}+{\widehat{\beta }}_\mathrm{cox}) \{A_\mathrm{pm}(T_i;\widehat{\theta })-A_\mathrm{pm}(T_j;\widehat{\theta })\} \end{pmatrix}^{\mathrm{t}}. \end{aligned}$$
[7]
Finally, we estimate the covariance $G=\mathrm{Cov}(U_\mathrm{cox},U^{\mathrm{t}})$ as given in (28). We use
$$\begin{aligned} \widehat{G}&= - \frac{1}{n}\sum _{i=1}^n\frac{D_i}{R^{(0)}_n(T_i;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} \psi (T_i;\widehat{\theta })\{{\widehat{A}}_\mathrm{cox}(T_i) R^{(1)}_n(T_i;2{\widehat{\beta }}_\mathrm{cox})^{\mathrm{t}} - R^{(0)}_n(T_i; 2{\widehat{\beta }}_\mathrm{cox})\widehat{F}(T_i)^{\mathrm{t}}\} \\ {\widehat{A}}_ \mathrm{cox}(T_i)R^{(2)}_n(T_i;2{\widehat{\beta }}_\mathrm{cox}) - R^{(1)}_n(T_i; 2{\widehat{\beta }}_\mathrm{cox})\widehat{F}(T_i)^{\mathrm{t}} \end{pmatrix}^{\mathrm{t}} \\&\quad - \frac{1}{n} \sum _{i=1}^n\sum _{j:T_j \le T_i} \frac{D_j E_n(T_j;{\widehat{\beta }}_\mathrm{cox})}{R_n^{(0)}(T_j;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} R^{(0)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm}) \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta }) - A^{\mathrm{d}}_\mathrm{pm}(T_j;\widehat{\theta })\} \\ R^{(1)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm}) \{A_\mathrm{pm}(T_i;\widehat{\theta })-A_\mathrm{pm}(T_j;\widehat{\theta })\} \end{pmatrix}^{\mathrm{t}} \\&\quad + \frac{1}{n} \sum _{i=1}^n\sum _{j:T_j \le T_i} \frac{D_j}{R^{(0)}_n(T_j;{\widehat{\beta }}_\mathrm{cox})} \begin{pmatrix} \{A^{\mathrm{d}}_\mathrm{pm}(T_i;\widehat{\theta }) - A^{\mathrm{d}}_\mathrm{pm}(T_j;\widehat{\theta })\} R^{(1)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm})^{\mathrm{t}} \\ \{A_\mathrm{pm}(T_i;\widehat{\theta })-A_\mathrm{pm}(T_j;\widehat{\theta })\} R^{(2)}_{(i)}({\widehat{\beta }}_\mathrm{cox}+{\widehat{\beta }}_\mathrm{pm}) \end{pmatrix}^{\mathrm{t}} \\&\quad + \begin{pmatrix} 0_{p \times q} \\ \widehat{J}_{\mathrm{cox}} \end{pmatrix}^{\mathrm{t}}. \end{aligned}$$

Relying strictly on the plug-in principle has the beneficial property that all estimators are consistent. This follows from the continuous mapping theorem since the precise formulae for the quantities in (39) are all seen to be continuous in the quantities and functions (in their appropriate spaces) for which we employ the plug-in principle.

To arrive at consistent estimators for $v_\mathrm{cox}, v_c$ and $v_\mathrm{pm}$ for the classes of focus parameters we have investigated, one typically needs consistent estimators also for the quantities: $m'_\mathrm{pm}, m'_\mathrm{cox}, z'_\mathrm{pm}, z'_\mathrm{cox}$, $\zeta _\mathrm{pm}(\cdot )$, $\zeta _\mathrm{cox}(\cdot ), V_{t,\mathrm{pm}}(\cdot ), V_{t,\mathrm{cox}}(\cdot ),h_\mathrm{pm}(\phi _\mathrm{pm})$ and $h_\mathrm{cox}(\phi _\mathrm{cox})$, as described in Section 4.1. All except the last of these are continuous when viewed as functions of the unknown quantities $\theta _0, \beta _0, \beta _\mathrm{true}$ and $A_\mathrm{true}(\cdot )$. These are therefore estimated consistently by plugging in empirical analogues, like above. The last quantity $h_\mathrm{cox}(\phi _\mathrm{cox})=\alpha _\mathrm{true}(\phi _\mathrm{cox})\exp (x^{\mathrm{t}}\beta _\mathrm{true})$, with $\phi _\mathrm{cox}=A^{-1}_\mathrm{true}(-\log (1-u)/\exp (x^{\mathrm{t}}\beta _\mathrm{true}))$ involved in estimation of a quantile (see Sect. 3 in the supplementary material (Jullum and Hjort, this work)), is more delicate as we need the estimator to be smooth or at least nonzero. The troublesome part is estimation of $\alpha _\mathrm{true}$ at the unknown position $\phi _\mathrm{cox}$. This position is estimated by ${\widehat{\phi }}_\mathrm{cox}={\widehat{A}}^{-1}_\mathrm{cox}(-\log (1-u)/\exp (x^{\mathrm{t}}{\widehat{\beta }}_\mathrm{cox}))$, while a smooth estimate of $\alpha _\mathrm{true}$ is obtained e.g. via a kernel estimator ${\widehat{\alpha }}_\mathrm{cox}(t) = \int h^{-1} K^\circ ((t-s)/h) \, \mathrm{d}{\widehat{A}}_\mathrm{cox}(s)$ for some suitable kernel $K^\circ $ and bandwidth $h=h_n$, which then is evaluated in ${\widehat{\phi }}_\mathrm{cox}$. As long as the bandwidth has the property that $h_n \rightarrow 0$, $n h_n \rightarrow \infty $, and $\alpha _\mathrm{true}$ is positive and two times differentiable in a neighborhood of $\phi _\mathrm{cox}$, this strategy also yields a consistent estimator. Thus, replacing the quantities in the various forms of $v_\mathrm{cox}$, $v_c$, $v_\mathrm{pm}$ towards the end of Section 4.1, by the estimators presented in this appendix, yields consistent estimators ${\widehat{v}}_\mathrm{cox}$, ${\widehat{v}}_c$, ${\widehat{v}}_\mathrm{pm}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jullum, M., Hjort, N.L. What price semiparametric Cox regression?. Lifetime Data Anal 25, 406–438 (2019). https://doi.org/10.1007/s10985-018-9450-7

Download citation

Received: 08 April 2017
Accepted: 17 August 2018
Published: 14 September 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s10985-018-9450-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What price semiparametric Cox regression?

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Principles of confounder selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 203 KB)

Appendices

Appendix

Estimating variances and covariances

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What price semiparametric Cox regression?

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Principles of confounder selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 203 KB)

Appendices

Appendix

Estimating variances and covariances

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation