Appendix
1.1 Proof of Proposition 1
Assuming that \({G_\theta }(x)\) is differentiable, such that \({g_\theta }(x) = \frac{d}{{dx}}{G_\theta }(x)\), we can make \(x = G_\theta ^{ - 1}(v)\) with \(dx = \frac{{dv}}{{{g_\theta }(G_\theta ^{ - 1}(v))}} = \frac{{dv}}{{{g_\theta }(x)}}\). This way, we can rewrite Eq. (6) as
$$\begin{aligned} \int _0^1 {\prod \limits _{j = 1}^n {{{\mathrm {C}}_{\varvec{\cdot }|\,V}}({u_j}|v)\,dv} }=\int _0^1 {\prod \limits _{j = 1}^n {\exp \left\{ { - G_\theta ^{ - 1}(v){\varphi ^{-1} _\theta }({u_j})} \right\} dv} }. \end{aligned}$$
(13)
By subtracting \(\int _0^1 {\prod \limits _{j = 1}^n {\exp \left\{ { - G_\theta ^{ - 1}(v){\varphi ^{-1} _\theta }({u_j})} \right\} dv} }\) from both sides of Eq. (13), we get
$$\begin{aligned}&\int _0^1 {\prod \limits _{j = 1}^n {{{\mathrm {C}}_{\varvec{\cdot }|\,V}}({u_j}|v)\,dv} } - \int _0^1 {\prod \limits _{j = 1}^n {\exp \left\{ { - G_\theta ^{ - 1}(v){\varphi ^{-1} _\theta }({u_j})} \right\} dv} } = 0\\&\quad \Longleftrightarrow \int _0^1 {\left( {\prod \limits _{j = 1}^n {{{\mathrm {C}}_{\varvec{\cdot }|\,V}}({u_j}|v) - \prod \limits _{j = 1}^n {\exp \left\{ { - G_\theta ^{ - 1}(v){\varphi ^{-1} _\theta }({u_j})} \right\} } } } \right) } \,dv = 0. \end{aligned}$$
Hence, for the above condition to hold, it is sufficient that
$$\begin{aligned} \prod \limits _{j = 1}^n {{{\mathrm {C}}_{\varvec{\cdot }|\,V}}({u_j}|v) - \prod \limits _{j = 1}^n {\exp \left\{ { - G_\theta ^{ - 1}(v){\varphi ^{-1} _\theta }({u_j})} \right\} } } = 0, \end{aligned}$$
or, equivalently,
$$\begin{aligned} {{\mathrm {C}}_{\varvec{\cdot }|\,V}}({u_j}|v) = \exp \left\{ { - G_\theta ^{ - 1}(v){\varphi ^{-1} _\theta }({u_j})} \right\} , \end{aligned}$$
which is the conditional distribution derived from
$$\begin{aligned} {{\mathrm {C}}_{\varvec{\cdot }V}}({u_j},v)&= \int _0^v {\exp \left\{ { - G_\theta ^{ - 1}(t){\varphi ^{-1} _\theta }({u_j})} \right\} dt}\nonumber \\&=\int _0^{G_\theta ^{ - 1}(v)} {\exp \left\{ { - s{\varphi ^{-1} _\theta }({u_j})} \right\} d{G_\theta }(s)}, \end{aligned}$$
a bivariate function with the following properties:
1) \({\mathrm {C}}_{\varvec{\cdot } V}(u_j,v)\) is grounded
$$\begin{aligned}\begin{aligned} {{\mathrm {C}}_{\varvec{\cdot } V}}(0,v) = \int _0^v {\exp \left\{ { - G_\theta ^{ - 1}(t)\varphi _\theta ^{ - 1}(0)} \right\} dt = } \int _0^v {0\,dt = 0} \\ {{\mathrm {C}}_{\varvec{\cdot } V}}({u_j},0) = \int _0^0 {\exp \left\{ { - G_\theta ^{ - 1}(t)\varphi _\theta ^{ - 1}({u_j})} \right\} dt} = 0. \\ \end{aligned} \end{aligned}$$
2) \({\mathrm {C}}_{\varvec{\cdot } V}(u_j,v)\) has margins \(u_j\) and v
$$\begin{aligned} {{\mathrm {C}}_{\varvec{\cdot } V}}(1,v)&= \int _0^v {\exp \left\{ { - G_\theta ^{ - 1}(t)\varphi _\theta ^{ - 1}(1)} \right\} dt = } \int _0^v {dt} = v\\ {{\mathrm {C}}_{\varvec{\cdot } V}}({u_j},1)&= \int _0^{G_\theta ^{ - 1}(1)} {\exp \left\{ { - s\varphi _\theta ^{ - 1}({u_j})} \right\} d{G_\theta }(s)} = \int _0^{ + \infty } {\exp \left\{ { - s\varphi _\theta ^{ - 1}({u_j})} \right\} d{G_\theta }(s)}\\&= {\varphi _\theta }\left( {\varphi _\theta ^{ - 1}({u_j})} \right) = {u_j}. \end{aligned}$$
3) \({\mathrm {C}}_{\varvec{\cdot } V}(u_j,v)\) is 2-increasing, i.e., \(\forall u_j,u^{*}_j,v,v^* \in [0,1]\) with \(u_j\le u^{*}_j\), \(v\le v^*\), it follows that \({{\mathrm {C}}_{\varvec{\cdot } V}}(u_j^*,{v^*}) - {{\mathrm {C}}_{\varvec{\cdot } V}}({u_j},{v^*}) - {{\mathrm {C}}_{\varvec{\cdot } V}}(u_j^*,v) + {{\mathrm {C}}_{\varvec{\cdot } V}}({u_j},v) \geqslant 0\)
$$\begin{aligned}&\int _0^{{v^*}} {\left[ {\exp \left\{ { - G_\theta ^{ - 1}(t){\varphi _\theta ^{ - 1}}(u_j^*)} \right\} - \exp \left\{ { - G_\theta ^{ - 1}(t){\varphi _\theta ^{ - 1}}({u_j})} \right\} } \right] } \,dt\\&\qquad - \int _0^v {\left[ {\exp \left\{ { - G_\theta ^{ - 1}(t){\varphi _\theta ^{ - 1}}(u_j^*)} \right\} - \exp \left\{ { - G_\theta ^{ - 1}(t){\varphi _\theta ^{ - 1}}({u_j})} \right\} } \right] } \,dt \geqslant 0 \\&\quad \Longleftrightarrow \int _v^{{v^*}} {\left[ {\exp \left\{ { - G_\theta ^{ - 1}(t){\varphi _\theta ^{ - 1}}(u_j^*)} \right\} - \exp \left\{ { - G_\theta ^{ - 1}(t){\varphi _\theta ^{ - 1}}({u_j})} \right\} } \right] } \,dt \geqslant 0 . \end{aligned}$$
Therefore, \({\mathrm {C}}_{\varvec{\cdot } V}(u_j,v)\) is a bivariate copula, proving that the Archimedean copula model of Prenen et al. (2017a) is a subclass of our model.
1.2 Proof of Proposition 2
Let \({\tilde{T}}_{i0}=b_i\) and \(\epsilon _{ij}\sim N(0,1)\), such that \({\tilde{T}}_{ij} = \sqrt{\sigma }b_i + \sqrt{1-\sigma } \epsilon _{ij}\). Since \(b_i, \epsilon _{i1},\ldots ,\epsilon _{in_i}\) are independent standard normal random variables, their joint density is \(f_{b_i, \epsilon _{i1},\ldots ,\epsilon _{in_i}}=\phi (b_i)\phi (\epsilon _{i1})\ldots \phi (\epsilon _{in_i})\). Given the set of transformation functions
$$\begin{aligned} {{\tilde{T}}_{i0}}&= {g_{0}}(b_i,\epsilon _{i1},\ldots ,{\epsilon _{in_i}}) = b_i\\ {{\tilde{T}}_{i1}}&= {g_1}(b_i,\epsilon _{i1},\ldots ,{\epsilon _{in_i}}) = \sqrt{\sigma }b_i + \sqrt{1 - \sigma } {\epsilon _{i1}}\\&\,\,\,\, \vdots \\ {{\tilde{T}}_{in_i}}&= {g_{n_i}}(b_i,\epsilon _{i1},\ldots ,{\epsilon _{in_i}}) = \sqrt{\sigma }b_i + \sqrt{1 - \sigma } {\epsilon _{in_i}}, \end{aligned}$$
take the set of inverse transformation functions \(g_0^{ - 1},g_1^{ - 1},\ldots ,g^{ - 1}_{n_i}\)
$$\begin{aligned} b_i&= g_0^{-1}({\tilde{T}}_{i0},{\tilde{T}}_{i1},\ldots ,{\tilde{T}}_{in_i}) = {\tilde{T}}_{i0}\\ {\epsilon _{i1}}&= {g_1^{-1}}({\tilde{T}}_{i0},{\tilde{T}}_{i1},\ldots ,{\tilde{T}}_{in_i}) = \frac{{{{\tilde{T}}_{i1}} - \sqrt{\sigma }{{\tilde{T}}_{i0}}}}{{\sqrt{1 - \sigma } }} \\&\vdots \\ {\epsilon _{in_i}}&= g^{ - 1}_{n_i}({\tilde{T}}_{i0},{\tilde{T}}_{i1},\ldots ,{\tilde{T}}_{in_i}) = \frac{{{{\tilde{T}}_{in_i}} - \sqrt{\sigma }{{\tilde{T}}_{i0}}}}{{\sqrt{1 - \sigma } }}. \end{aligned}$$
It can be shown that the Jacobian of the transformations above is given by
$$\begin{aligned} J=\prod \limits _{j = 1}^{n_i} {\frac{1}{{\sqrt{1 - \sigma } }}}. \end{aligned}$$
Hence, the joint cumulative distribution function of \({\tilde{T}}_{i1},\ldots ,{\tilde{T}}_{in_i}\) after integrating out \({\tilde{T}}_{i0}\) and making appropriate variable changes is given by
$$\begin{aligned} {F_{{\tilde{T}}_{i1},\ldots ,{\tilde{T}}_{in_i}}}&:= {\mathrm {C}}(\varPhi ({{\tilde{t}}_{i1}}),\ldots ,\varPhi ({{\tilde{t}}_{in_i}})) = \int _{ - \infty }^{ + \infty } {\left[ {\prod \limits _{j = 1}^{n_i} {\varPhi \left( {\frac{{{{\tilde{t}}_{ij}} - \sqrt{\sigma }{b_i}}}{{\sqrt{1 - \sigma } }}} \right) } } \right] } \,\phi ({b_i})d{b_i}\\&=\int _{ 0 }^{ 1 } { {\prod \limits _{j = 1}^{n_i} {\varPhi \left( {\frac{{{{\tilde{t}}_{ij}} - \sqrt{\sigma }{\varPhi ^{-1}(v_i) }}}{{\sqrt{1 - \sigma } }}} \right) } } } \,d{v_i}, \end{aligned}$$
a one-factor copula model where the bivariate linking copulas are all Gaussian with correlation parameter \(\theta =\sqrt{\sigma }\).
On the original time scale:
$$\begin{aligned} S(t_{i1},\ldots ,t_{in_i}|{\mathbf {Z}}_{i1},\ldots ,{\mathbf {Z}}_{in_i})=\int _{ 0 }^{ 1 } { {\prod \limits _{j = 1}^{n_i} {\varPhi \left( {\frac{{{\varPhi ^{-1}(S(t_{ij}|{\mathbf {Z}}_{ij}))} - \sqrt{\sigma }{\varPhi ^{-1}(v_i) }}}{{\sqrt{1 - \sigma } }}} \right) } } } \,d{v_i}, \end{aligned}$$
where \(S(t_{ij}|{\mathbf {Z}}_{ij})\) is the marginal survival function associated with a Cox proportional hazards model. \(\square \)
For convenience, we first introduce some notations and definitions adapted from Prenen et al. (2017a) and Othus and Li (2010) before we continue with the other proofs.
$$\begin{aligned} {Y_{ij}}(t)= & {} {I_{\left\{ {{X_{ij}} \ge t} \right\} }}\\ {\check{\varLambda }} (t)= & {} \int _0^t \frac{{d\sum \limits _{i = 1}^K {\sum \limits _{j = 1}^{{n_i}} {{\delta _{ij}}{I_{\left\{ {{X_{ij}} \le u} \right\} }}} } }}{{\sum \limits _{i = 1}^K {\sum \limits _{j = 1}^{{n_i}} {{Y_{ij}}(u)\exp \left\{ {\check{{\varvec{\beta }}}'{\mathbf{{Z}}_{ij}}(u)} \right\} } } }} \\= & {} \sum \limits _{i = 1}^K {\sum \limits _{j = 1}^{{n_i}} {\frac{{{\delta _{ij}}{I_{\left\{ {{X_{ij}} \le t} \right\} }}}}{{\sum \limits _{k = 1}^K {\sum \limits _{l = 1}^{{n_k}} {{I_{\left\{ {{X_{kl}} \le {X_{ij}}} \right\} }}\exp \left\{ {\check{{\varvec{\beta }}}'{\mathbf{{Z}}_{kl}}(X_{ij})} \right\} } } }}} } ,\\ {H_{ij}}= & {} \exp \left[ { - \int _0^\tau {{Y_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} d\varLambda (u)} } \right] ,\\ H_{ij}^0= & {} \exp \left[ { - \int _0^\tau {{Y_{ij}}(u)\exp \left\{ {{{{\varvec{\beta }'_0}}}{\mathbf{{Z}}_{ij}}(u)} \right\} d{\varLambda _0}(u)} } \right] ,\\ {{\check{H} }_{ij}}= & {} \exp \left[ { - \int _0^\tau {{Y_{ij}}(u)\exp \left\{ {\check{{\varvec{\beta }}}' {\mathbf{{Z}}_{ij}}(u)} \right\} d{\check{\varLambda }} (u)} } \right] ,\\ {H_{ij}}(t)= & {} \exp \left[ { - \int _0^\tau {{Y_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} d\left\{ {\varLambda + t\left( {\varGamma - \varLambda } \right) } \right\} (u)} } \right] . \end{aligned}$$
Note that \({H_{ij}} = {H_{ij}}(0)\).
$$\begin{aligned} L\left( {\theta ,{\varvec{\beta }} ,\varLambda } \right)&= \prod \limits _{i = 1}^K {{L_i}(\theta ,\varvec{{\varvec{\beta }}} ,\varLambda )}\\&= \prod \limits _{i = 1}^K {\int _0^1 {\prod \limits _{j = 1}^{{n_i}} {{{\mathrm {c}}_{{\varvec{\cdot }}{V}}}{{({H_{ij}},{v_i})}^{{\delta _{ij}}}} \times } } \,{{{\mathrm {C}}}_{{\varvec{\cdot }}|{V}}}{{({H_{ij}}|{v_i})}^{1 - {\delta _{ij}}}}d{v_i}}\\&= \prod \limits _{i = 1}^K {\int _0^1 {\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log \left( {{{\mathrm {c}}_{{\varvec{\cdot }}{V}}}{{({H_{ij}},{v_i})}^{{\delta _{ij}}}} \times {{{\mathrm {C}}}_{{\varvec{\cdot }}|{V}}}{{({H_{ij}}|{v_i})}^{1 - {\delta _{ij}}}}} \right) } } \right\} d{v_i}} \,}\\&= \prod \limits _{i = 1}^K {\int _0^1 {\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log {\pmb {{{\mathrm {C}}}}}}(H_{ij},v_i;\theta ) } \right\} d{v_i}} }, \end{aligned}$$
where \(\pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta ) = {{{\mathrm {c}}_{{\varvec{\cdot }}{V}}}{{({H_{ij}},{v_i};\theta )}^{{\delta _{ij}}}} \times {{{\mathrm {C}}}_{{\varvec{\cdot }}|{V}}}{{({H_{ij}}|{v_i};\theta )}^{1 - {\delta _{ij}}}}}\).
$$\begin{aligned}&{l_K}(\theta ) = {K^{ - 1}}\log \left\{ {L\left( {\theta ,{\varvec{\beta }} ,\varLambda } \right) } \right\} ,\\&{l_{K0}}(\theta ) = {K^{ - 1}}\log \left\{ {L\left( {\theta ,{{\varvec{\beta }} _0},{\varLambda _0}} \right) } \right\} ,\\&{{\check{l} }_K}(\theta ) = {K^{ - 1}}\log \left\{ {L\left( {\theta ,\check{{\varvec{\beta }}} ,{\check{\varLambda }} } \right) } \right\} ,\\&{U_K}(\theta ) = \frac{{\partial {l_K}(\theta )}}{{\partial \theta }} = {K^{ - 1}}\frac{{\partial \log \left\{ {L\left( {\theta ,{\varvec{\beta }} ,\varLambda } \right) } \right\} }}{{\partial \theta }}\\&\quad = {K^{ - 1}}\sum \limits _{i = 1}^K {\frac{{\displaystyle \int _0^1 {\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log \pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} \left\{ {\sum \limits _{j = 1}^{{n_i}} {\frac{\partial }{{\partial \theta }}\log \pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} d{v_i}} }}{{\displaystyle \int _0^1 {\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log \pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} d{v_i}} }}},\\&{U_{K0}}(\theta ) = \frac{{\partial {l_{K0}}(\theta )}}{{\partial \theta }} = {K^{ - 1}}\frac{{\partial \log \left\{ {L\left( {\theta ,{{\varvec{\beta }} _0},{\varLambda _0}} \right) } \right\} }}{{\partial \theta }},\\&{{{\check{U}} }_K}(\theta ) = \frac{{\partial {{{\check{l}} }_K}(\theta )}}{{\partial \theta }} = {K^{ - 1}}\frac{{\partial \log \left\{ {L\left( {\theta ,{\check{\varvec{\beta }}} ,{{\check{\varLambda }}} } \right) } \right\} }}{{\partial \theta }}. \end{aligned}$$
The following notation is copied from Spiekerman and Lin (1998). Let \({\mathbf{{a}}^{ \otimes 0}} = 1,\,{\mathbf{{a}}^{ \otimes 1}} = \mathbf{{a}},\,{\mathbf{{a}}^{ \otimes 2}} = \mathbf{{a'a}}\) and \(r=0,1,2\):
$$\begin{aligned} {\mathbf{{S}}^{(r)}}\left( {{\varvec{\beta }} ,t} \right)= & {} {K^{ - 1}}\sum \limits _{i = 1}^K {\sum \limits _{j = 1}^{{n_i}} {{Y_{ij}}(t)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(t)} \right\} {\mathbf{{Z}}_{ij}}{{(t)}^{ \otimes r}}} },\qquad \\ {\mathbf{{s}}^{(r)}}= & {} E\left[ {{\mathbf{{S}}^{(r)}}\left( {{\varvec{\beta }} ,t} \right) } \right] ,\\ \mathbf{{E}}({\varvec{\beta }} ,t)= & {} \frac{{{\mathbf{{S}}^{(1)}}\left( {{\varvec{\beta }} ,t} \right) }}{{{S^{(0)}}\left( {{\varvec{\beta }} ,t} \right) }},\\ \mathbf{{e}}({\varvec{\beta }} ,t)= & {} \frac{{{\mathbf{{s}}^{(1)}}\left( {{\varvec{\beta }} ,t} \right) }}{{{s^{(0)}}\left( {{\varvec{\beta }} ,t} \right) }},\\ \mathbf{{V}}({\varvec{\beta }} ,t)= & {} \frac{{{\mathbf{{S}}^{(2)}}\left( {{\varvec{\beta }} ,t} \right) }}{{{S^{(0)}}\left( {{\varvec{\beta }} ,t} \right) }} - \mathbf{{E}}{({\varvec{\beta }} ,t)^{ \otimes 2}},\\ \mathbf{{v}}({\varvec{\beta }} ,t)= & {} \frac{{{\mathbf{{s}}^{(2)}}\left( {{\varvec{\beta }} ,t} \right) }}{{{s^{(0)}}\left( {{\varvec{\beta }} ,t} \right) }} - \mathbf{{e}}{({\varvec{\beta }} ,t)^{ \otimes 2}}. \end{aligned}$$
Assume the following regularity conditions, where \(\tau >0\) is a constant denoting the last survival time of the uncensored subjects:
Condition 1
\({\varvec{\beta }}\) is in a compact subset of \({\mathbb {R}}^p\).
Condition 2
\(\varLambda (t) < \infty \).
Condition 3
\(\theta \in \nu \), where \(\nu \) is a compact subset of \(\varTheta \).
Condition 4
\(P({ {{\mathcal {C}}}_{ij}} \ge t,\,\forall t \in [0,\tau ])> {\delta _c} > 0\) for \(i=1,\ldots ,K\) and \(j=1,\ldots ,n_i\).
Condition 5
Let \({\mathbf{{Z}}_{ij}}(t) = \left\{ {{Z_{ij1}}(t),\ldots ,{Z_{ijp}}(t)} \right\} \). For \(i=1,\ldots ,K\), \(j=1,\ldots ,n_i\) and \(k=1,\ldots ,p\),
$$\begin{aligned} \left| {{Z_{ijk}}(0)} \right| + \int _0^\tau {\left| {d{Z_{ijk}}(t)} \right| \le {B_Z} < \infty } \qquad \text {almost surely for some constant }B_z. \end{aligned}$$
Condition 6
\(E[\log \left\{ {{L_i}({\theta _1},{\varvec{\beta }} ,\varLambda )/{L_i}({\theta _2},{\varvec{\beta }} ,\varLambda )} \right\} \) exists for all \({\theta _1},{\theta _2} \in \varTheta \), \(i=1,\ldots ,K\).
Condition 7
\(\mathbf{{A}} = \int _0^\tau {\mathbf{{v}}({{\varvec{\beta }} _0},u){s^{(0)}}({{\varvec{\beta }} _0},u)d{\varLambda _0}(u)}\) is positive definite.
Condition 8
The bivariate copula \({\mathrm {C}}_{\varvec{\cdot } V}(u_{ij},v_i;\theta )\) is absolutely continuous.
1.3 Proof of Theorem 2
Since the consistency of \(\check{{\varvec{\beta }}}\) and \({\check{\varLambda }}\) was already proved in Spiekerman and Lin (1998), we only show the consistency of \({\check{\theta }}\). This is done by extending the results in Prenen et al. (2017a) and Othus and Li (2010).
Accounting for the fact that we use plug-in estimators for \({\varvec{\beta }}\) and \(\varLambda \), we proceed by taking a Taylor series expansion of the log-likelihood of \(\theta \) in the neighbourhood of \({\varvec{\beta }}\) and \(\varLambda \). In view of \(\varLambda \) being an unspecified function, we need to include a functional expansion term. The concept of Hadamard differentiability is suitable in this case. In order to use this approach, we must first verify that the log-likelihood \(l(\theta )\) is Hadamard differentiable with respect to \(\varLambda \): By condition 5, the term \({\int _0^\tau {{Y_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} d\varLambda (u)} }\) in \(H_{ij}\) is Hadamard differentiable. Furthermore, by the chain rule for Hadamard derivatives (Van der Vaart 2000), we conclude that \(l(\theta )\) is Hadamard differentiable with respect to \(\varLambda \).
Let \({BV}[0,\tau ]\) denote the class of functions with bounded total variation on \([0,\tau ]\). The Hadamard derivative of \(l(\theta )\) with respect to \(\varLambda \) at \(\varGamma - \varLambda \in {{BV}}[0,\tau ]\) can be obtained by taking the derivative of \({K^{ - 1}}\log \left[ {L\left\{ {\theta ,{\varvec{\beta }} ,\varLambda + t\left( {\varGamma - \varLambda } \right) } \right\} } \right] \) with respect to t and then making \(t=0\):
$$\begin{aligned}\frac{d}{{dt}}\left( {{K^{ - 1}}\log \left[ {L\left\{ {\theta ,{\varvec{\beta }} ,\varLambda + t\left( {\varGamma - \varLambda } \right) } \right\} } \right] } \right) \bigg |_{t=0} = \int _0^\tau {{\zeta _K}(\theta ,\varLambda )(u)d(\varGamma - \varLambda )(u)}, \end{aligned}$$
where \({\zeta _K}\left( {\theta ,\varLambda } \right) \left( u \right) \) is equal to
$$\begin{aligned}&{K^{ - 1}}\sum \limits _{i = 1}^K {\frac{{\displaystyle \int _0^1 {\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log \pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} \left[ {\sum \limits _{j = 1}^{{n_i}} {\left\{ {\left( {\frac{\partial }{{\partial {H_{ij}}}}\log \pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta )} \right) D_{ij}^\varLambda } \right\} } } \right] d{v_i}} }}{{\displaystyle \int _0^1 {\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log \pmb {{{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} d{v_i}} }}}\nonumber \\&\quad ={K^{ - 1}}\sum \limits _{i = 1}^K \displaystyle \int _0^1 P(v_i|H_{i\varvec{\cdot }},\theta )\left[ {\sum \limits _{j = 1}^{{n_i}} {\left\{ {\left( {\frac{\partial }{{\partial {H_{ij}}}}\log {{\pmb {{\mathrm {C}}}}}(H_{ij},v_i,\theta )} \right) D_{ij}^\varLambda } \right\} } } \right] d{v_i}\nonumber \\&\quad = {K^{ - 1}}\sum \limits _{i = 1}^K \sum \limits _{j = 1}^{n_i} D_{ij}^\varLambda \, E\left[ \frac{\partial }{{\partial {H_{ij}}}}\log {{\pmb {{\mathrm {C}}}}}(H_{ij},v_i,\theta )\right] , \end{aligned}$$
$$\begin{aligned}D_{ij}^\varLambda = ( - {H_{ij}}){Y_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} ,\end{aligned}$$
and
$$\begin{aligned} P(v_i|H_{i\varvec{\cdot }},\theta ) = \frac{\exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log {\pmb {{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} }{\displaystyle \int _0^1 \exp \left\{ {\sum \limits _{j = 1}^{{n_i}} {\log {\pmb {{\mathrm {C}}}}(H_{ij},v_i,\theta )} } \right\} d{v_i}} \end{aligned}$$
(14)
is a probability density function of a random variable \(V_i\) assuming values in [0, 1]. Similarly, the derivative of \(l(\theta )\) with respect to \({\varvec{\beta }}\) is
$$\begin{aligned} \zeta _K(\theta ,{\varvec{\beta }} )&={K^{ - 1}}\sum \limits _{i = 1}^K \displaystyle \int _0^1 P(v_i|H_{i\varvec{\cdot }},\theta )\left[ {\sum \limits _{j = 1}^{{n_i}} {\left\{ {\left( {\frac{\partial }{{\partial {H_{ij}}}}\log {{\pmb {{\mathrm {C}}}}}(H_{ij},v_i,\theta )} \right) D_{ij}^{\varvec{\beta }} } \right\} } } \right] d{v_i} \\&={K^{ - 1}}\sum \limits _{i = 1}^K \sum \limits _{j = 1}^{n_i} D_{ij}^{\varvec{\beta }} \, E\left[ \frac{\partial }{{\partial {H_{ij}}}}\log {{\pmb {{\mathrm {C}}}}}(H_{ij},v_i,\theta )\right] , \end{aligned}$$
where
$$\begin{aligned}D_{ij}^{\varvec{\beta }} = ( - {H_{ij}})\int _0^\tau {{Y_{ij}}(u){\mathbf{{Z}}_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} d\varLambda (u)}. \end{aligned}$$
Let \(\left\| {\, \cdot \,} \right\| \) denote the Euclidean norm and let \({\left\| {\, \cdot \,} \right\| _\infty }\) denote the supremum norm on \([0,\tau ]\). To prove the consistency of \({\check{\theta }}\), we need \({\left\| {{\zeta _K}(\theta ,\varLambda )} \right\| _\infty }\) and \(\left\| {{\zeta _K}(\theta ,{\varvec{\beta }} )} \right\| \) to be bounded. Note that, by the definition of \(H_{ij}\) and conditions 2 and 5, the terms \({\left\| {\, D_{ij}^\varLambda \,} \right\| _\infty }\) and \(\left\| {\, D_{ij}^{\varvec{\beta }} \,} \right\| \) are bounded. Therefore, in order to satisfy the boundedness condition of \({\left\| {{\zeta _K}(\theta ,\varLambda )} \right\| _\infty }\) and \(\left\| {{\zeta _K}(\theta ,{\varvec{\beta }} )} \right\| \), we shall require the expectations in their formulae to be finite.
We now continue with the proof by taking an expansion of \({{{\check{l}} }_K}(\theta )\) around \({{\varvec{\beta }} _0}\) and \({\varLambda _0}\), given by
$$\begin{aligned}{{{\check{l}} }_K}(\theta ) = {l_{K0}}(\theta ) + {\zeta _K}(\theta ,{{\varvec{\beta }} _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}) + \int _0^\tau {{\zeta _K}(\theta ,{\varLambda _0})(t)d({{\check{\varLambda }}} - {\varLambda _0})(t)} + R.\end{aligned}$$
Another (intuitive) notation is
$$\begin{aligned} {l_{K,\theta }}({\check{\varvec{\beta }}} ,{{\check{\varLambda }}} )= & {} {l_{K,\theta }}({{\varvec{\beta }} _0},{\varLambda _0}) + \frac{\partial }{{\partial {\varvec{\beta }} }}{l_{K,\theta }}({{\varvec{\beta }} _0},{\varLambda _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}) \\&+ \frac{\partial }{{\partial \varLambda }}{l_{K,\theta }}({{\varvec{\beta }} _0},{\varLambda _0})({{\check{\varLambda }}} - {\varLambda _0}) + R.\end{aligned}$$
The remainder term R is of order \({o_p}\left( {\max \left\{ {\left\| {{\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}} \right\| ,{{\left\| {{{\check{\varLambda }}} - {\varLambda _0}} \right\| }_\infty }} \right\} } \right) \). This can be seen from the definition of Hadamard differentiability, since
$$\begin{aligned}&{\left\| {\frac{{{l_{K,\theta }}\left\{ {{\varvec{\beta }} ,{\varLambda _0} + t({{\check{\varLambda }}} - {\varLambda _0})} \right\} - {l_{K,\theta }}({\varvec{\beta }} ,{{\check{\varLambda }}} )}}{t} - \frac{\partial }{{\partial \varLambda }}{l_{K,\theta }}({{\varvec{\beta }}},{\varLambda _0})({{\check{\varLambda }}} - {\varLambda _0})} \right\| _\infty } \\&\quad \rightarrow 0,\quad \text {as}\,\,t \downarrow 0 \end{aligned}$$
uniformly in \({\check{\varLambda }}-\varLambda _0\) in all compact subsets of \({\mathbb {D}}\), the space of cumulative hazard functions. Since \(\check{{\varvec{\beta }}}\) is consistent and \({\check{\varLambda }}\) is uniformly consistent (Spiekerman and Lin 1998), \(R={o_p}(1)\). To prove that \({\check{\theta }}\) is consistent we need to verify the uniform convergence of the log-likelihood with the plug-in estimate of \(\varLambda \) to the expected value of the log-likelihood evaluated at the true value of \(\varLambda \), denoted \(l_{K0}(\theta )\):
$$\begin{aligned} \mathop {\sup }\limits _{\theta \in \nu } \left| {{{{\check{l}} }_K}(\theta ) - E[{l_{K0}}(\theta )]} \right| = {o_P}(1). \end{aligned}$$
(15)
This can be shown as follows:
$$\begin{aligned} {{{\check{l}} }_K}(\theta ) - E[{l_{K0}}(\theta )]&= {l_{K0}}(\theta ) - E[{l_{K0}}(\theta )] + {\zeta _K}(\theta ,{{\varvec{\beta }} _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0})\\&\quad + \int _0^\tau {{\zeta _K}(\theta ,{\varLambda _0})(t)d({{\check{\varLambda }}} - {\varLambda _0})(t)} + R. \end{aligned}$$
By the law of large numbers, for fixed \(\theta \)
$$\begin{aligned} {l_{K0}}(\theta ) - E[{l_{K0}}(\theta )]\mathop \rightarrow \limits ^p 0. \end{aligned}$$
(16)
Since \(\left\| {{\zeta _K}(\theta ,{\varvec{\beta }} )} \right\| \) and \({\left\| {{\zeta _K}(\theta ,\varLambda )}(u) \right\| _\infty }\) are bounded, say \(\left\| {{\zeta _K}(\theta ,{\varvec{\beta }} )} \right\| \le M_1\) and \({\left\| {{\zeta _K}(\theta ,\varLambda )}(u) \right\| }_\infty \le M_2\), we have
$$\begin{aligned}&\mathop {\sup }\limits _{\theta \in \nu } \left| {{\zeta _K}(\theta ,{{\varvec{\beta }} _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0})} \right| \le {M_1}\left\| {{\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}} \right\| ,\\&\mathop {\sup }\limits _{\theta \in \nu } \left| {\int _0^\tau {{\zeta _K}(\theta ,{\varLambda _0})(t)d({{\check{\varLambda }}} - {\varLambda _0})(t)} } \right| \le {M_2}{\left\| {{{\check{\varLambda }}} - {\varLambda _0}} \right\| _\infty }. \end{aligned}$$
For this reason,
$$\begin{aligned} \mathop {\sup }\limits _{\theta \in \nu } \left| {{{{\check{l}} }_K}(\theta ) - E[{l_{K0}}(\theta )]} \right|&\le \mathop {\sup }\limits _{\theta \in \nu } \left| {{l_{K0}}(\theta ) - E[{l_{K0}}(\theta )]} \right| + {M_1}\left\| {{\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}} \right\| \\&\quad + {M_2}{\left\| {{{\check{\varLambda }}} - {\varLambda _0}} \right\| _\infty } + R. \end{aligned}$$
Using result (16), the consistency of \(\check{{\varvec{\beta }}}\) and the uniform consistency of \({\check{\varLambda }}\) and the fact that \(R={o_p}(1)\), we obtain
$$\begin{aligned}\mathop {\sup }\limits _{\theta \in \nu } \left| {{{{\check{l}} }_K}(\theta ) - E[{l_{K0}}(\theta )]} \right| = {o_p}(1).\end{aligned}$$
Finally, to verify that \({\check{\theta }}\) is consistent, we need to show that the expected log-likelihood is maximized at the true value:
$$\begin{aligned} E[{l_{K0}}(\theta )] - E[{l_{K0}}({\theta _0})] < 0. \end{aligned}$$
(17)
Since clusters are independent, the log-likelihood \(l_K(\theta )\) can be written as a sum of independent and identically distributed random variables:
$$\begin{aligned}{K^{ - 1}}\sum \limits _{i = 1}^K {\log \left\{ {{L_i}(\theta ,{\varvec{\beta }} ,\varLambda )} \right\} }, \end{aligned}$$
with
$$\begin{aligned} {L_i}&= {( - 1)^{{d_i}}}\frac{{{\partial ^{{d_i}}}}}{{{{\left( {\partial {x_{i1}}} \right) }^{{\delta _{i1}}}}\ldots \,\,{{\left( {\partial {x_{i{n_i}}}} \right) }^{{\delta _{i{n_i}}}}}}}{S}({x_{i1}},\ldots ,{x_{i{n_i}}})\\&= \prod \limits _{i = 1}^K \int _0^1 \exp \left\{ \sum \limits _{j = 1}^{{n_i}} \log \left( { \mathrm {c}_{ \varvec{\cdot }V }}{{(\exp \{ - \varLambda ({x_{ij}})\} ,{v_i})}^{{\delta _{ij}}}} \right. \right. \\&\left. \left. \,\,\,\,\, \times {{{\mathrm {C}}}_{ \varvec{\cdot }|V }}{{(\exp \{ - \varLambda ({x_{ij}})\} |{v_i})}^{1 - {\delta _{ij}}}} \right) \right\} d{v_i}. \end{aligned}$$
Take \(\theta \ne \theta _0 \). The law of large numbers, Jensen’s inequality and condition 6 imply that
$$\begin{aligned}&\mathop {\lim }\limits _{K \rightarrow \infty } {l_{K0}}(\theta ) - {l_{K0}}({\theta _0}) = E[{l_{K0}}(\theta )] - E[{l_{K0}}({\theta _0})]\\&\quad = E\left[ {{K^{ - 1}}\sum \limits _{i = 1}^K {\log \left\{ {{L_i}(\theta ,{{\varvec{\beta }} _0},{\varLambda _0})} \right\} } } \right] - E\left[ {{K^{ - 1}}\sum \limits _{i = 1}^K {\log \left\{ {{L_i}({\theta _0},{{\varvec{\beta }} _0},{\varLambda _0})} \right\} } } \right] \\&\quad = E\left[ {\log \left\{ {{L_i}(\theta ,{{\varvec{\beta }} _0},{\varLambda _0})} \right\} - \log \left\{ {{L_i}({\theta _0},{{\varvec{\beta }} _0},{\varLambda _0})} \right\} } \right] \\&\quad = E\left[ {\log \left\{ {{L_i}(\theta ,{{\varvec{\beta }} _0},{\varLambda _0})/{L_i}({\theta _0},{{\varvec{\beta }} _0},{\varLambda _0})} \right\} } \right] \\&\quad \le \log \left( {E\left[ {\log \left\{ {{L_i}(\theta ,{{\varvec{\beta }} _0},{\varLambda _0})/{L_i}({\theta _0},{{\varvec{\beta }} _0},{\varLambda _0})} \right\} } \right] } \right) \\&\quad \le \log \left( {E\left[ {\log \left\{ {{L_i}(\theta ,{{\varvec{\beta }} _0},{\varLambda _0})/{L_i}({\theta _0},{{\varvec{\beta }} _0},{\varLambda _0})} \right\} } \right] } \right) \\&\quad = \log (1) = 0. \end{aligned}$$
Since \({\check{\theta }}\) maximizes \(\check{l}(\theta )\), Eq. (15) implies that
$$\begin{aligned} 0 \le {{{\check{l}} }_K}\left( {{\check{\theta }} } \right) - {{{\check{l}} }_K}\left( {{\theta _0}} \right) + E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] - E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] = {{{\check{l}} }_K}\left( {{\check{\theta }} } \right) - E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] + {o_p}(1)\\ \implies E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] \le {{{\check{l}} }_K}\left( {{\check{\theta }} } \right) + {o_p}(1). \end{aligned}$$
Subtracting \(E\left[ {{l_{K0}}\left( {{\check{\theta }} } \right) } \right] \) from each side of the inequality we get
$$\begin{aligned}&E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] - E\left[ {{l_{K0}}\left( {{\check{\theta }} } \right) } \right] \le {{{\check{l}} }_K}\left( {{\check{\theta }} } \right) - E\left[ {{l_{K0}}\left( {{\check{\theta }} } \right) } \right] + {o_p}(1)\nonumber \\&\quad \le \mathop {\sup }\limits _{\theta \in \varTheta } \left| {{{{\check{l}} }_K}\left( \theta \right) - E\left[ {{l_{K0}}\left( \theta \right) } \right] } \right| + {o_p}(1) = {o_p}(1). \end{aligned}$$
(18)
Now take \(\theta \) such that \(|\theta -\theta _0|\ge \epsilon \) for any fixed \(\epsilon >0\). By inequality (17), there must be some \({\gamma _\varepsilon }>0\) such that
$$\begin{aligned}E\left[ {{l_{K0}}\left( {{\check{\theta }} } \right) } \right] + {\gamma _\varepsilon } < E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] .\end{aligned}$$
It follows that
$$\begin{aligned}P(|{\check{\theta }} - {\theta _0}| \ge \varepsilon ) \le P\left\{ {E\left[ {{l_{K0}}\left( {{\check{\theta }} } \right) } \right] + {\gamma _\varepsilon } < E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] } \right\} .\end{aligned}$$
Equation (18) implies that
$$\begin{aligned}P\left\{ {E\left[ {{l_{K0}}\left( {{\check{\theta }} } \right) } \right] + {\gamma _\varepsilon } < E\left[ {{l_{K0}}\left( {{\theta _0}} \right) } \right] } \right\} \rightarrow 0 \qquad \text {as}\,\, K\rightarrow \infty . \end{aligned}$$
Therefore
$$\begin{aligned}P(|{\check{\theta }} - {\theta _0}| \ge \varepsilon ) \rightarrow 0 \qquad \text {as}\,\, K\rightarrow \infty . \end{aligned}$$
1.4 Proof of Theorem 3
Take a first order Taylor series expansion of \({{{\check{U}} }_K}({\check{\theta }} )\) around \(\theta _0\):
$$\begin{aligned}{{{\check{U}} }_K}({\check{\theta }} ) = {{{\check{U}} }_K}({\theta _0}) + \left( {{\check{\theta }} - {\theta _0}} \right) \frac{{\partial {{{\check{U}} }_K}}}{{\partial \theta }}\bigg |_{\theta =\theta ^*},\end{aligned}$$
where \(\theta ^*\) is between \({\check{\theta }}\) and \(\theta _0\). It must be that \(\check{U}_K({\check{\theta }})=0\) since \({\check{\theta }}\) was taken to be the maximum of \(L(\theta ,\check{{\varvec{\beta }}},{\check{\varLambda }})\). For this reason
$$\begin{aligned} \sqrt{K} \left( {{\check{\theta }} - {\theta _0}} \right) = \frac{{\sqrt{K} {{{\check{U}} }_K}\left( {{\theta _0}} \right) }}{{ - \partial {{{\check{U}} }_K}/\partial \theta \big |_{\theta =\theta ^*}}}. \end{aligned}$$
(19)
We already showed that \({\check{\theta }}\) consistently estimates \(\theta _0\), so the law of large numbers implies that
$$\begin{aligned}\frac{{\partial {{{\check{U}} }_K}}}{{\partial \theta }}\bigg |_{\theta =\theta ^*} \rightarrow W\left( {{\theta _0}} \right) = \mathop {\lim }\limits _{K \rightarrow \infty } \frac{{\partial {{{\check{U}} }_K}}}{{\partial \theta }}\bigg |_{\theta =\theta _0}.\end{aligned}$$
We shall show that the score equation \(\check{U}_K({\theta }_0)\) in the numerator of Eq. (19) follows a normal distribution. Hereto, we need a Taylor series expansion of \(\check{U}_K({\theta _0})\) in the neighbourhood of \({\varvec{\beta }}_0\) and \(\varLambda _0\). Because \(\varLambda _0\) is an unspecified function, we shall use the Hadamard derivative of \(U_K(\theta _0)\) with respect to \(\varLambda \) at \(\varGamma - \varLambda \in {{BV}}[0,\tau ]\):
$$\begin{aligned}\frac{d}{{dt}}\left( {{K^{ - 1}}\frac{{\partial \log \left[ {L\left\{ {\theta ,{\varvec{\beta }} ,\varLambda + t(\varGamma - \varLambda )} \right\} } \right] }}{{\partial \theta }}} \right) \bigg |_{t=0} = \int _0^\tau {{\xi _K}(\theta ,\varLambda )(u)d(\varGamma - \varLambda )(u)}, \end{aligned}$$
where \({\xi _K}(\theta ,\varLambda )(u)\) is equal to
$$\begin{aligned}&{K^{ - 1}}\sum \limits _{i = 1}^K \left[ \displaystyle \int _0^1 P(v_i|H_{i\varvec{\cdot }},\theta ) \left\{ {\sum \limits _{j = 1}^{{n_i}} {\frac{{{\partial ^2}\log {\pmb {{\mathrm {C}}}}(H_{ij},v_i,\theta )}}{{\partial \theta \partial {H_{ij}}}}D_{ij}^\varLambda } } + \sum \limits _{j = 1}^{{n_i}} \frac{\partial \log \pmb {{\mathrm {C}}}(H_{ij},v_i,\theta )}{\partial \theta }\right. \right. \\&\,\,\quad \left. \times \sum \limits _{j = 1}^{{n_i}} \frac{\partial \log \pmb {{\mathrm {C}}}(H_{ij},v_i,\theta )}{\partial H_{ij}}D_{ij}^\varLambda \right\} dv_i \\&\qquad -\displaystyle \int _0^1 P(v_i|H_{i\varvec{\cdot }},\theta )\left\{ \sum \limits _{j = 1}^{{n_i}} \frac{\partial \log \pmb {{\mathrm {C}}}(H_{ij},v_i,\theta )}{\partial \theta }\right\} dv_i\\&\,\,\quad \left. \times \displaystyle \int _0^1 P(v_i|H_{i\varvec{\cdot }},\theta )\left\{ \sum \limits _{j = 1}^{{n_i}} \frac{\partial \log \pmb {{\mathrm {C}}}(H_{ij},v_i,\theta )}{\partial H_{ij}}D_{ij}^{\varLambda }\right\} dv_i\right] \\&\quad ={K^{ - 1}}\sum \limits _{i = 1}^K\sum \limits _{j = 1}^{n_i}D_{ij}^{\varLambda }\left\{ E\left[ \frac{{{\partial ^2}\log {\pmb {{\mathrm {C}}}}(H_{ij},v_i,\theta )}}{{\partial \theta \partial {H_{ij}}}} \right] \right. \\&\,\,\quad \left. + \sum \limits _{k = 1}^{n_i}\mathrm {Cov}\left[ \frac{{{\partial }\log {\pmb {{\mathrm {C}}}}(H_{ij},v_i,\theta )}}{{\partial \theta }},\frac{{{\partial }\log {\pmb {{\mathrm {C}}}}(H_{ik},v_i,\theta )}}{{\partial H_{ik}}} \right] \right\} ,\\&D_{ij}^\varLambda = ( - {H_{ij}}){Y_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} , \end{aligned}$$
and \(P(v_i|H_{i\varvec{\cdot }},\theta )\) has the same definition as in expression (14). The derivative of \(U_K(\theta _0)\) with respect to \({\varvec{\beta }}\) is given by the same expression as \({\xi _K}(\theta ,\varLambda )(u)\), replacing \(D_{ij}^\varLambda \) for
$$\begin{aligned}D_{ij}^{\varvec{\beta }} = ( - {H_{ij}})\int _0^\tau {{Y_{ij}}(u){\mathbf{{Z}}_{ij}}(u)\exp \left\{ {{\varvec{\beta }} '{\mathbf{{Z}}_{ij}}(u)} \right\} d\varLambda (u)}. \end{aligned}$$
By the same arguments used to show the consistency of \({\check{\theta }}\), we also need \({\left\| {{\xi _K}(\theta ,\varLambda )} \right\| _\infty }\) and \(\left\| {{\xi _K}(\theta ,{\varvec{\beta }} )} \right\| \) to be bounded. For this reason, we shall require the expectation and covariance in their formulae to be finite. Hence, we proceed by taking a Taylor series expansion of \(\check{U}_K({\theta _0})\) in the neighbourhood of \({\varvec{\beta }}_0\) and \(\varLambda _0\) which gives
$$\begin{aligned} {{{\check{U}} }_K}({\theta _0})= & {} {U_{K0}}({\theta _0}) + {\xi _K}({\theta _0},{{\varvec{\beta }} _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}) \\&+ \int _0^\tau {{\xi _K}({\theta _0},{\varLambda _0})(t)d\left\{ {{{\check{\varLambda }}} (t) - {\varLambda _0}(t)} \right\} } + {G_K}, \end{aligned}$$
where \({G_K}\) is the remainder term for the Taylor series. Since \({\check{\varLambda }}\) is \(\sqrt{K}\) consistent, it can be shown that \({G_K}=o_p(K^{-1/2})\). Define the pointwise limit of \({{\xi _K}(\theta ,\varLambda )}(t)\) as \({{\xi }(\theta ,\varLambda )}(t)\) and denote \({{\xi }(\theta ,{\varvec{\beta }} )}=E[{{\xi _K}(\theta ,{\varvec{\beta }} )}]\). Since \({\left\| {{\xi _K}(\theta ,\varLambda )} \right\| _\infty }\) and \(\left\| {{\xi _K}(\theta ,{\varvec{\beta }} )} \right\| \) are bounded, \({\left\| {{\xi }(\theta ,\varLambda )} \right\| _\infty }\) and \(\left\| {{\xi }(\theta ,{\varvec{\beta }} )} \right\| \) are also. Therefore
$$\begin{aligned} \sqrt{K} {{{\check{U}} }_K}({\theta _0})&= \sqrt{K} \left[ {U_{K0}}({\theta _0}) + {\xi _K}({\theta _0},{{\varvec{\beta }} _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}) \right. \\&\quad \left. + \int _0^\tau {{\xi _K}({\theta _0},{\varLambda _0})(t)d\left\{ {{{\check{\varLambda }}} (t) - {\varLambda _0}(t)} \right\} } \right] + {o_p}(1). \end{aligned}$$
By Spiekerman and Lin (1998)
$$\begin{aligned}\sqrt{K} ({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}) \rightarrow {\mathbf{{A}}^{ - 1}}\sum \limits _{i = 1}^K {{\mathbf{{w}}_{i.}}}, \end{aligned}$$
where \({{\mathbf{{w}}_{i.}}}\) is the ith component of the score function for \({\varvec{\beta }}\) under the independence working assumption, evaluated at \({\varvec{\beta }}_0\):
$$\begin{aligned}{\mathbf{{w}}_{i.}} = \sum \limits _{j = 1}^{{n_i}} {\int _0^\tau {\left\{ {{\mathbf{{Z}}_{ij}}(u) - E\left[ {{{\varvec{\beta }} _0},u} \right] } \right\} d{M_{ij}}(u)} }, \end{aligned}$$
with
$$\begin{aligned}{M_{ij}}(t) = {\delta _{ij}}{Y_{ij}}(t) - \int _0^t {{Y_{ij}}(u)\exp \left\{ {{{{\varvec{\beta }} '}_0}{\mathbf{{Z}}_{ij}}(u)} \right\} d{\varLambda _0}(u)}. \end{aligned}$$
They also showed that
$$\begin{aligned}\sqrt{K} \left\{ {{{{{\check{\varLambda }}} }_0}(t,{\check{\varvec{\beta }}} ) - {\varLambda _0}(t)} \right\} \rightarrow {\mathcal {W}}(t) = {K^{ - 1/2}}\sum \limits _{i = 1}^K {{\varPsi _i}(t)}, \end{aligned}$$
where \({\mathcal {W}}(t)\) is a zero mean Gaussian process with variance function
$$\begin{aligned}E\left[ {{\varPsi _i}{{(t)}^2}} \right] ,\end{aligned}$$
with
$$\begin{aligned}{\varPsi _i}(t) = \int _0^t {\frac{{d{M_{i.}}(u)}}{{{s^{(0)}}\left( {{{\varvec{\beta }} _0},u} \right) }} + {\mathbf{{h}}^T}(t){\mathbf{{A}}^{ - 1}}{\mathbf{{w}}_{i.}}} \end{aligned}$$
and
$$\begin{aligned}{} \mathbf{{h}}(t) = - \int _0^t {\mathbf{{e}}\left( {{{\varvec{\beta }} _0},u} \right) d{\varLambda _0}(u)}. \end{aligned}$$
That is why
$$\begin{aligned}&\sqrt{K} \left[ {{U_{K0}}({\theta _0}) + {\xi _K}({\theta _0},{{\varvec{\beta }} _0})({\check{\varvec{\beta }}} - {{\varvec{\beta }} _0}) + \int _0^\tau {{\xi _K}({\theta _0},{\varLambda _0})(t)d\left\{ {{{\check{\varLambda }}} (t) - {\varLambda _0}(t)} \right\} } } \right] \\&\quad =\sqrt{K} \left[ {K^{ - 1}}\sum \limits _{i = 1}^K {{\phi _i}({\theta _0})} + {\xi _K}({\theta _0},{{\varvec{\beta }} _0}){K^{ - 1/2}}{\mathbf{{A}}^{ - 1}}\sum \limits _{i = 1}^K {{\mathbf{{w}}_{i.}}}\right. \\&\qquad \left. \,\,\,\,+ {K^{ - 1/2}}\int _0^\tau {{\xi _K}({\theta _0},{\varLambda _0})(t)d\left\{ {{K^{ - 1/2}}\sum \limits _{i = 1}^K {{\varPsi _i}(t)} } \right\} } \right] \\&\quad =\sqrt{K} \sum \limits _{i = 1}^K \left[ {K^{ - 1}}{\phi _i}({\theta _0}) + {\xi _K}({\theta _0},{{\varvec{\beta }} _0}){K^{ - 1/2}}{\mathbf{{A}}^{ - 1}}{\mathbf{{w}}_{i.}} \right. \\&\left. \qquad + {K^{ - 1}}\int _0^\tau {{\xi _K}({\theta _0},{\varLambda _0})(t)d{\varPsi _i}(t)} \right] \\&\quad ={K^{ - 1/2}}\sum \limits _{i = 1}^K {\left[ {{\phi _i}({\theta _0}) + {\xi _K}({\theta _0},{{\varvec{\beta }} _0})\sqrt{K} {\mathbf{{A}}^{ - 1}}{\mathbf{{w}}_{i.}} + \int _0^\tau {{\xi _K}({\theta _0},{\varLambda _0})(t)d{\varPsi _i}(t)} } \right] } \\&\quad ={K^{ - 1/2}}\sum \limits _{i = 1}^K {{\varXi _i}} \end{aligned}$$
The central limit theorem implies that \(\sqrt{K}\check{U}_K(\theta _0)\) converges to a normally distributed random variable with mean 0 and variance equal to the variance of \(\varXi \). Thus we have
$$\begin{aligned}\sqrt{K} \left( {{\check{\theta }} - {\theta _0}} \right) = \frac{{\sqrt{K} {{{\check{U}} }_K}({\theta _0})}}{{ - \partial {{{\check{U}} }_K}/\partial \theta \big |_{\theta =\theta ^*}}},\end{aligned}$$
where
$$\begin{aligned}\sqrt{K} {{{\check{U}} }_K}({\theta _0})\mathop \rightarrow \limits ^D N\left\{ {0,{\mathop {\mathrm {var}}} \left( \varXi \right) } \right\} \end{aligned}$$
and
$$\begin{aligned}\partial {{{\check{U}} }_K}/\partial \theta \big |_{\theta =\theta ^*}\mathop \rightarrow \limits ^P W\left( {{\theta _0}} \right) .\end{aligned}$$
By Slutsky’s theorem, \(\sqrt{K} \left( {{\check{\theta }} - {\theta _0}} \right) \) converges to a normal distribution with mean 0 and variance
$$\begin{aligned}{\mathop {\mathrm {var}}} \left( \varXi \right) /W{\left( {{\theta _0}} \right) ^2}. \end{aligned}$$
The variance of \(\varXi \) (note that \(\mathrm {var}[\varXi ]=E[\varXi ^2]\)) can be estimated by \({K^{ - 1}}\sum \nolimits _{i = 1}^K {{{\check{\varXi }} ^2}}\), where \({\check{\varXi }}\) is obtained from \({\varXi } \) replacing parameter values by their estimates. \(W(\theta _0)\) can be estimated by the (minus) derivative of the pseudoscore function \(\check{U}_K(\theta )\) evaluated at \({\check{\theta }}\). Therefore, the standard error of \({\check{\theta }}\) is given by the square root of
$$\begin{aligned} \mathrm {var}({\check{\theta }})=\frac{K^{-2}\sum \nolimits _{i = 1}^K {{{\check{\varXi }} ^2}}}{\left( \partial {{{\check{U}} }_K}/\partial \theta \big |_{\theta ={\check{\theta }}}\right) ^2}. \end{aligned}$$