Abstract
Incomplete information on explanatory variables is commonly encountered in studies of possibly censored event times. A popular approach to deal with partially observed covariates is multiple imputation, where a number of completed data sets, that can be analyzed by standard complete data methods, are obtained by imputing missing values from an appropriate distribution. We show how the combination of multiple imputations from a compatible model with suitably estimated parameters and the usual Cox regression estimators leads to consistent and asymptotically Gaussian estimators of both the finite-dimensional regression parameter and the infinite-dimensional cumulative baseline hazard parameter. We also derive a consistent estimator of the covariance operator. Simulation studies and an application to a study on survival after treatment for liver cirrhosis show that the estimators perform well with moderate sample sizes and indicate that iterating the multiple-imputation estimator increases the precision.
Similar content being viewed by others
References
Andersen, P., Borgan, Ø., Gill, R., Keiding, N. (1992). Statistical models based on counting processes. New York: Springer.
Bartlett, J., Seaman, S., White, I., Carpenter, J., The Alzheimer’s Disease Neuroimaging Initiative. (2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462–487.
Chen, H. Y. (2002). Double-semiparametric method for missing covariates in Cox regression models. Journal of the American Statistical Association, 97, 565–576.
Chen, H. Y., Little, R. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association, 94, 896–908.
Cox, D. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34, 187–220.
Dudley, R. (1984). A course on empirical processes, volume 1097 of Lecture Notes in Mathematics. Berlin: Springer.
Herring, A., Ibrahim, J. (2001). Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association, 96, 292–302.
Jacobsen, M., Keiding, N. (1995). Coarsening at random in general sample spaces and random censoring in continuous time. The Annals of Statistics, 23(3), 774–786.
Kosorok, M. (2008). Introduction to empirical processes and semiparametric inference. New York: Springer.
Martinussen, T. (1999). Cox regression with incomplete covariate measurements using the EM-algorithm. Scandinavian Journal of Statistics, 26, 479–491.
Nielsen, S. F. (2003). Proper and improper multiple imputation. International Statistical Review, 71(3), 593–607.
Pugh, M., Robins, J., Lipsitz, S., Harrington, D. (1993). Inference in the Cox proportional hazards model with missing covariate data. Technical Report, Department of Biostatistics, Harvard School of Public Health.
Qi, L., Wang, C., Prentice, R. (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association, 100, 1250–1263.
Robins, J., Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113–124.
Rubin, D. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.
Schenker, N., Welsh, A. H. (1988). Asymptotic results for multiple imputation. The Annals of Statistics, 16(4), 1550–1566.
Schlichting, P., Christensen, E., Andersen, P., Fauerholdt, L., Juhl, E., Poulsen, H., Tygstrup, N. (1983). Prognostic factors in cirrhosis identified by Cox’s regression model. Hepatology, 3, 889–895.
Sterne, J., White, I., Carlin, J., Spratt, M., Royston, P., Kenward, M., Wood, A., Carpenter, J. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 339, b2393.
Tsiatis, A. (2006). Semiparametric theory and missing data. New York: Springer.
van der Vaart, A. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.
van der Vaart, A., Wellner, J. (1996). Weak convergence and empirical processes. With applications to statistics. New York: Springer.
Wang, N., Robins, J. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85, 935–948.
White, I., Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Assumptions
Assumption 1
Assume that \((\beta _0,\theta _0)\in {\mathcal {B}}\times \Theta \) for known compact sets \({\mathcal {B}}\subset {\mathbb {R}}^p\) and \(\Theta \subset {\mathbb {R}}^q\), and that \(A_0(t)\) is strictly increasing and continuously differentiable and that \(A_0(0)=0\).
Assumption 2
The covariates X are bounded almost surely.
Assumption 3
Data are missing at random, \(pr({\mathcal {C}}=r|Z=z)=pr\{{\mathcal {C}}=r|G_{{\mathcal {C}}}(Z)=G_{r}(z)\}\).
Assumption 4
The full-data information matrix, \(I^F\), for \(\beta \) at the true parameter value is invertible.
Assumption 5
There is a finite maximum follow-up time \(\tau >0\), when all individuals still at risk are censored, and \(pr\{Y(\tau )=1\}=pr(T=\tau )>0\).
Assumption 6
The censoring distribution does not depend on \(\phi _0\) and potentially missing covariates, \( \alpha _{U}(t|x)^{1-\delta }pr(U>t|x)=\alpha _{U}\{t|G_{X,r}(x)\}^{1-\delta }pr\{U>t|G_{X,r}(x)\}\).
Assumption 7
There exists a consistent (but possibly inefficient) asymptotically linear estimator \({\hat{\phi }^{I}}=\{{\hat{\beta }}^{I},{\hat{A}}^I(t),{\hat{\theta }}^{I}\}\) such that \(n^{1/2}({\hat{\phi }^{I}}-\phi _0)(t)=n^{-1/2}\sum _{i=1}^n q\{\mathcal C_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)+o_P(1)\), where \(q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)\) are independent processes, converges weakly to a tight Gaussian process in \({\mathbb {R}}^{p}\times \ell ^{\infty }[0,\tau ]\times {\mathbb {R}}^{q}\). Further, we assume that the variance \(\textit{var}\{q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)\}\) can be estimated consistently by \(n^{-1}\sum _{i=1}^n{\hat{q}}\{{\mathcal {C}}_i,G_{\mathcal C_i}(Z_i)\}(t){\hat{q}}\{{\mathcal {C}}_i,G_{\mathcal C_i}(Z_i)\}(t)^\top \) for some suitable \({\hat{q}}\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)\).
Assumption 8
Assume that \(p_{X|{\mathcal {C}},G}(x|r,g,\phi )\), the conditional density of X given \({\mathcal {C}}\) and \(G_{{\mathcal {C}}}\) with respect to a reference measure \(\nu _X\), is a Lipschitz continuous function of \(\phi \) (with respect to the \(L_2\)-norm) in a neighborhood of \(\phi _0\), with an integrable Lipschitz constant, h(x|r, g) such that \(\int h(x|r,g){\mathrm {d}}\nu _X(x)\) is a bounded function of (r, g).
1.2 Lemmas
We first introduce some notation. The density of the (potentially unobserved) full data \(z=(t,\delta ,x)\) and the observed data \(\{r,g=(t,\delta ,g_{x})\}\) are
where \(\nu _{\cdot }(\cdot )\) is a dominating measure for which the densities of the random variables are defined. Recall the definition \(\tilde{p}_{Z}(z,\phi )=\exp \left\{ \delta \beta ^{\top }x-A(t)\exp \left( \beta ^{\top }x\right) \right\} p_{X}(x,\theta )\) and let \(\tilde{p}_{G}(g,\phi )=\int _{\{G_{r}(v)=g\}} \tilde{p}_{Z}(v){\mathrm {d}}\nu _Z(v)\). Note that
The following lemma building on Wang and Robins (1998), Robins and Wang (2000), see also Tsiatis (2006, Lemma 14.2), will be used repeatedly.
Lemma 1
For f(t, Z), continuous in \(t\in [0,\tau ]\) and bounded with probability one,
where the remainder term is uniform in t.
Proof
Following Tsiatis (2006, pp. 350–352), we write
so that
\(\square \)
Lemma 2
Let \(f[\{X_{ij}(\phi ),{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}_{j=1,\ldots ,m}]\) be a bounded function. Then the logarithm of the \(\epsilon \)-bracketing number of the class
is bounded by a constant times \(1/\epsilon \).
Proof
Let \(F_i(\phi )=E(f[\{X_{ij}(\phi )\}_{j=1,\ldots ,m},{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))\). Then
by assumption 8. It follows that the bracketing number of the class (8) is bounded by the bracketing number of \(\{\phi : \Vert \phi -\phi _0\Vert _{L_2}\le \delta \}\) and this is dominated by the bracketing number of the integrated baseline hazard which is smaller than \(\exp (K/\epsilon )\) by van der Vaart and Wellner (1996, Theorem 2.7.5) for a constant K.\(\square \)
It follows that for a bounded function f, the process
is stochastic equicontinuous near \(\phi _0\), and that
converges almost surely, uniformly in a neighborhood of \(\phi _0\). The process
is not stochastic equicontinuous in general. A proof of this is included at the end of this appendix.
We will need some results for averages of functions of the imputations and the unknown parameter.
Lemma 3
Let \(f[ \{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi ]\) be a bounded function which is Lipschitz continuous as a function of \(\phi \) in a neighborhood of \(\phi _0\) with a bounded Lipschitz constant. Then
converges to in probability to 0 for any consistent estimator \({\tilde{\phi }}\) of \(\phi _0\).
Proof
As \(|f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]-f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi _0]|\le \textit{constant}\times \Vert {\tilde{\phi }}-\phi _0\Vert _{L_2}\), we only need to consider the case where \({\tilde{\phi }}=\phi _0\). Letting
we see that \(E\{F_i|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}=0\) so that
as \(F_i\) is bounded by assumption.\(\square \)
Corollary 1
Let \(f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi ]\) be a bounded function, which is Lipschitz continuous as a function of \(\phi \) in a neighborhood of \(\phi _0\) with a bounded Lipschitz constant. Suppose further that \(E(f[\{Z_{ij}(\phi ')\}_{j=1,\ldots ,m},\phi _0])\) is a continuous function of \(\phi '\). Then
in probability for any consistent estimator \({\tilde{\phi }}\) of \(\phi _0\).
Proof
The average \(n^{-1}\sum _{i=1}^nf[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]\) may be split into a sum of
which is \(o_P(1)\) by lemma 3, and \(n^{-1}\sum _{i=1}^n E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m},\phi _0]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))_{|\phi ={\hat{\phi }^{I}}}\) which converges to \(E(f[\{Z_{1j}\}_{j=1,\ldots ,m},\phi _0])\) by lemma 2 and the uniform law of large numbers.\(\square \)
Lemma 4
If \({\tilde{\beta }}\rightarrow \beta _0\) in probability, then
in probability, uniformly in \(t\in [0,\tau ]\).
Proof
It suffices to consider the case where X is one-dimensional. Clearly, by differentiability and boundedness,
so we may replace \({\tilde{\beta }}\) by \(\beta _0\). Furthermore, by corollary 1, \(n^{-1}\sum _{i=1}^nS_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}-s_k(t)=o_P(1)\) for any t. Assume for simplicity \(X_1\ge 0\) with probability 1. Choose finitely many \(0=t_0< t_1<\cdots < t_L= \tau \) such that for any t there is an \(\ell \) such that \(E\{Y_1(t)-Y_1(t_\ell )\}, E\{Y_1(t_{\ell -1})-Y_1(t)\}\le \epsilon /c_k\), where \(c_k\) is an upper bound on \(X_1^k\exp (\beta _0^\top X_1)\). Then
where the \(o_P(1)\)-term does not depend on t. Combined with a similar lower bound, this yields the desired uniform convergence. If \(pr(X_1<0)>0\) we may split (when \(k=1\)) \(X_{ij}({\hat{\phi }^{I}})\) into a sum of \(X_{ij}({\hat{\phi }^{I}})-\min X_1\) and \(\min X_1\), where \(\min X_1\) denotes the lower bound for the support of \(X_1\) (the essential infimum). Thus, \(n^{-1}\sum _{i=1}^nS_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}\) may be split into a sum of two terms, each of which may be handled as indicated above.\(\square \)
1.3 Proof of Theorem 1: Regression parameters
The multiple-imputation estimator of \(\beta _0\) is \({\hat{\beta }}=m^{-1}\sum _{j=1}^m\hat{\beta }_j\), where the jth imputation estimator \(\hat{\beta }_j\) is the solution to \(U_j(\hat{\beta }_j,\hat{\phi }^I)=0\), with
Following standard arguments and using lemma 4, \({\hat{\beta }_{j}}\) may be shown to be consistent and \(n^{1/2}({\hat{\beta }_{j}}-\beta _0)=n^{-1/2}\left( I^F\right) ^{-1}U_j(\beta _0,\hat{\phi }^I)+o_P(1)\), where \(I^F\) is the full-data information matrix for \(\beta \). Averaging the m estimators we get
As the imputations depend on the initial estimator, \(\hat{\phi }^I\), which involves information from all subjects, this is not a sum of independent and identically distributed terms. We can write
The second term on the right-hand side above converges to zero in probability by Lemma 4 and Kosorok (2008, Lemma 4.2). To show that the third term also converges to zero in probability, it suffices (by Kosorok 2008, Lemma 4.2) to show that the second factor in the integrand of (10),
is bounded in probability. The first two terms have mean zero and finite variance and are thus bounded in probability. By stochastic equicontinuity, continuity of the mean and \(n^{1/2}\)-consistency of the initial estimator, the third term is also bounded in probability. Thus,
where \(E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi _0)\}]\) equals zero but has been included for clarity. Using lemma 1 we may write
where \(D_{\mathrm {eff}}(\phi _0)=E(S_{\mathrm {eff}}^F(Z)[{\mathcal {S}}_{\phi _0}(Z)-{\mathcal {S}}_{\phi _0}\{{\mathcal {C}},G_{{\mathcal {C}}}(Z)\}])\). Thus, the last three terms—(13), (14), (15)—may be written as
as (13) is \(o_P(1)\) by the stochastic equicontinuity implied by lemma 2.
Lemma 2 (with a straightforward extension) also implies that
almost surely, uniformly in a neighborhood of \(\phi _0\). Assume for now (for simplicity) that \({\hat{\phi }^{I}}\) is strongly consistent. Then, conditionally on the observed data, for almost every realization,
in distribution by the Lindeberg–Feller central limit theorem (van der Vaart 1998, Proposition 2.27). Using Schenker and Welsh (1988, Lemma 1) or Nielsen (2003, Lemma 1), it follows that (16) also holds unconditionally and that (12) is asymptotically independent of the observed data. Without strong consistency, we may for every subsequence extract a further subsequence where \({\hat{\phi }^{I}}\) converges almost surely to \(\phi _0\). Thus, every subsequence has a subsequence, where (16) holds. Thus, the conditional characteristic function of the left-hand side of (16) converges almost surely along subsequences of subsequences to the characteristic function of the right-hand side of (16). This implies that the convergence holds in probability for the original sequence of characteristic functions and as the characteristic function is bounded this ensures that (16) holds unconditionally. The asymptotic distribution of \({\hat{\beta }}\) now follows.
1.4 Proof of Theorem 1: Cumulative baseline hazard
The multiple-imputation estimator of the cumulative baseline hazard function is \({\hat{A}}(t)=m^{-1}\sum _{j=1}^m{\hat{A}}_{j}(t,{\hat{\beta }_{j}})\), where
is the estimator from the jth imputation where \(N_{\cdot }(t)=\sum _{i=1}^nN_i(t)\). Let
Then, \(M_{\cdot }(t)=\sum _{i=1}^nM(t,Z_i)\) is a zero mean square-integrable martingale with respect to the observed filtration.
We may write \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}=n^{1/2}\{{\hat{A}}(t)-{\hat{A}}_0(t)\}+n^{1/2}\{{\hat{A}}_0(t)-A_0(t)\}\), where
Using lemma 4 and Kosorok (2008, Lemma 4.2), we have
Now
The second term of (17) may be rewritten as:
which converges to a Gaussian martingale. Before turning to the first term of (17), we note that
The second term of (18) is asymptotically equivalent to
where \(D_0(u,\phi _0)=E(S_0(u,Z,\beta _0)[{\mathcal {S}}_{\phi _0}(Z)-{\mathcal {S}}_{\phi _0}\{{\mathcal {C}},G_{{\mathcal {C}}}(Z)\}])\) by lemma 1. Thus, we may write the integrand of the first term of (17) as
and hence the first term of (17) as
where the first and the third term are both \(o_P(1)\) (Kosorok 2008, Lemma 4.2). Thus
where the three latter terms converge as processes. To show tightness of the first term, let w(s, t) denote
Then, clearly \(E\{w(s,t)\}=E(E[w(s,t)|\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}_{i=1,\ldots ,n})\}=0\) so that
implying (van der Vaart and Wellner 1996, Section 2.2.3) that also the first term of (19) is tight. Finally, we may write \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}\) as a sum of
and
plus \(o_P(1)\)-terms. Proceeding as in the proof of asymptotic normality of the regression parameters, we can show that the terms in (21) are asymptotically independent of the terms in (20) and converge in distribution to a normal distribution. Also the terms in (20) are asymptotically normal. Thus \(n^{1/2}\{{\hat{A}}(t)-A_{0}(t)\}\) converges to a Gaussian process with mean 0.
1.5 Proof of Theorem 1: Joint convergence
To see that \(n^{1/2}({\hat{\beta }}-\beta _0)\) and \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}_{t\in [0,\tau ]}\) converge jointly in distribution, note that we have written both as a sum of terms—(12), (21)—that depend on the imputations but are asymptotically independent of the observed data, terms—(14), (15), (20)—that depend only on the observed data, and terms, that are asymptotically negligible. Joint convergence follows by noting that linear combinations of the “imputation terms”, (12) and (21), are asymptotically independent of the observed data and converge to a normal distribution, while the same linear combinations of the “observed data terms”, (14), (15) and (20), also converge to a normal distribution. Hence, \(n^{1/2}({\hat{\beta }}-\beta _0)\) and \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}_{t\in [0,\tau ]}\) converge jointly in distribution to a Gaussian process.
1.6 Iterating the estimation process
In order to establish asymptotic results for the iterated multiple-imputation estimator, we extend the arguments in the previous parts of the appendix to the case where the “initial estimator” is a multiple-imputation estimator of the type we are considering. We let \({\hat{\phi }}^{(1)}\) denote the multiple-imputation estimator based on the initial imputations and let \(Z_{ij}^{(2)}({\hat{\phi }}^{(1)})\) denote the second iteration imputations, i.e., imputations generated using \({\hat{\phi }}^{(1)}\) as the true parameter. We focus on the asymptotic distribution of \({\hat{\beta }}^{(2)}\), the multiple-imputation estimator of \(\beta _0\) based on the second iteration imputations and outline the changes we need to make to the expansion of the score function given in Eqs. (12)–(15).
Consider first the term (12). Conditional on the observed data and the first iteration imputations the mean of \(S_{\mathrm {eff}}^F\{Z_{ij}^{(2)}({\hat{\phi }}^{(1)})\}\) equals \(E[S_{\mathrm {eff}}^F\{Z_{ij}^{(2)}(\phi )\}|{\mathcal {C}}_i, G_{{\mathcal {C}}}(Z_i)]_{|\phi ={\hat{\phi }}^{(1)}}\) as the second iteration imputations only depend on the first iteration imputations through the first iteration estimator \({\hat{\phi }}^{(1)}\). It follows as before that (12) is asymptotically normal and asymptotically independent of the observed data (and the first iteration imputations).
The terms (13) and (14) are unchanged. Finally, the term (15) may be rewritten as \(D_{\mathrm {eff}}(\phi _0){n^{1/2}}({\hat{\phi }}^{(1)}-\phi _0)\). When plugging in the asymptotic expression for \({n^{1/2}}({\hat{\phi }}^{(1)}-\phi _0)\) derived above, and splitting it into the first iteration imputation part corresponding to (12) and (21) and the rest, we end up with a term (12) depending on the second iteration imputations, which is asymptotically independent of the first iteration imputations, terms depending on the first iteration imputations and the observed data, which are asymptotically independent of the observed data, and terms depending only on the observed data. It now follows that the Cox partial score function is asymptotically normal and it is straightforward to verify that it has the same asymptotic distribution as (5) with \(q_i\) replaced by \(\rho _i=(I^F)^{-1}\xi _i\).
The second iteration estimator of the integrated baseline hazard may be shown to be asymptotically Gaussian by following a similar line of arguments, splitting (21) into a sum of terms depending on the second iteration imputations and terms depending on the first iteration imputations and conditioning as above. Joint convergence follows in a similar manner to what we did for the original multiple-imputation estimator. Further iterations may be handled by splitting the “imputation terms” into additional terms and repeated conditioning.
1.7 Stochastic equicontinuity
Whereas stochastic equicontinuity of the empirical process based on \(m^{-1}\sum _{j=1}^mS_{\mathrm {eff}}^F\{Z_{ij}(\phi _0)\}\) is straightforward to verify when imputing a large class of continuous covariates, we claim that for discrete covariates the combination of the unknown baseline hazard and the inherent discontinuity of the covariate rules out stochastic equicontinuity. To see this, we prove the following lemma:
Lemma 5
The set of sets
with \({\mathcal {X}}\subset {\mathbb {R}}\) is a Vapnik–Chervonenkis (VC) class if and only if \({\mathcal {X}}\) is a finite set.
Proof
Consider a set \(A=\{(x_1,t_1),\ldots ,(x_n,t_n)\}\). Assuming that \(|{\mathcal {X}}|\) is finite, then any set of \(n>|{\mathcal {X}}|\) points will contain at least two points \((x_{i},t_{i}),(x_{j},t_{j})\), such that \(x_{i}=x_{j}\) and (without loss of generality) \(t_i\le t_j\). Clearly, we cannot pick out a subset of A containing \(x_{i}\) but not \(x_{j}\): If \(a(t_{i})\ge x_i\) then \(a(t_j)\ge a(t_i)\ge x_i=x_j\). Thus, no sufficiently large set is shattered, and the set of sets is a VC class. If \({\mathcal {X}}\) is not finite, then choosing A such that \(x_1<x_2<\cdots <x_n\) and \(t_1<t_2<\cdots <t_n\) any subset may be picked out: For a subset \(B\subseteq A\) choose a so that it jumps to just above \(x_i\) just before \(t_i\) for any i such that \((x_i,t_i)\in B\). As A can be shattered, the set of sets is not a VC class.\(\square \)
Consider imputing a single binary explanatory variable, X, with conditional probability of success given by
Then the simplest way of simulating X is
with \({\tilde{U}}=\text {logit}(U)\), where U is uniformly distributed. Lemma 5 shows that even if we fix \(\beta \) and \(\theta \), these indicator functions are not indicators of a VC class of sets. It follows that it is not VC if we allow \(\beta \) and \(\theta \) to vary, either. Dudley (1984, Theorem 11.4.1) shows that when a set of indicator functions are not based on a VC class, the corresponding empirical process is not pregaussian. This basically rules out stochastic equicontinuity.
This argument shows that the efficient score process with imputed data is not stochastic equicontinuous in general. It does not rule out—though we find it unlikely—that one might construct another simulation scheme which would be sufficiently “smooth” for a discrete covariate to make the process stochastic equicontinuous.
About this article
Cite this article
Eriksson, F., Martinussen, T. & Nielsen, S.F. Large sample results for frequentist multiple imputation for Cox regression with missing covariate data. Ann Inst Stat Math 72, 969–996 (2020). https://doi.org/10.1007/s10463-019-00716-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-019-00716-4