1 Introduction

This paper is aimed at developing improved inference procedure for multivariate failure time data with auxiliary information. Large cohort studies often involve thousands or more subjects and the studies, especially when involving failure time outcomes, could last for many years. It is often that the measurement of the primary covariate can only be obtained for a random subset of the study cohort due to technical difficulties or financial limitations. On the other hand, some auxiliary information that is less precise but highly correlated to the primary exposure can be cheaply collected for all cohort members. The auxiliary information could be a mismeasured surrogate to the true covariate, or any covariate that is informative about the true covariate. An example is from the left ventricular dysfunction (SOLVD 1991) prevention study, which aims to assess the effects of risk factors on the time (possibly censored) to heart failure and the first myocardial infarction. One of the most important risk factors is patient’s ejection fraction (EF), which can be precisely measured by using a standardized radionucleotide technique, but the cost is very high. Therefore, EF is only measured on a randomly chosen subset of all cohort, while a less precise but cheaper measurement of EF was ascertained for all the patients using a nonstandardized technique. Because each patient could experience both heart failure and the first myocardial infarction, statistical methods for handling multivariate failure time data with covariate measurement error would be required.

Proper use of auxiliary information has been proved to lead to improved efficiency of survival estimates in multivariate failure time data with auxiliary information. For example, Hu and Lin (2004) proposed a corrected estimation function under the assumption that the error is symmetrically distributed. Liu et al. (2009) and Liu et al. (2010) developed estimated pseudo-partial likelihood method for multivariate failure time data with discrete and continuous auxiliary variable, respectively. Liu et al. (2012) and Fan and Wang (2009) studied this problem under the assumption that the intra-cluster subjects have common baseline hazard. The above studies are based on the marginal hazards model, the intra-cluster correlation, however, is ignored in the estimation procedures and only adjusted in the inference step by applying a robust sandwich variance estimate. The practice of ignoring the intra-cluster correlation would result in some loss of efficiency.

Some authors have proposed to incorporate correlation explicitly into the estimating equations to improve the efficiency of estimate in dealing with multivariate failure time data with covariates being fully observed. For example, Cai and Prentice (1995, 1997) added a weight matrix based on the inverse of correlation matrix of marginal martingales into the partial likelihood score equation. Simulation studies have shown that their approach is more efficient than that using independent structure when cluster size is small. However, their method is computation intensive when the cluster size is large because the computation involves an estimation of very high dimensional weighting correlation matrix. To overcome this shortcoming, Xue et al. (2010) developed a different approach by applying the method of quadratic inference function (QIF). Their method avoids to explicitly estimate the correlation parameters and is easy to implement especially when cluster size is large. We note that both these two methods assume that the covariates could be observed completely and therefore cannot be applied directly to SOLVD data.

Motivated by the advantages of the QIF method provided, we extend this method to the analysis of multivariate failure time data with auxiliary information. Here, we assume that the auxiliary covariate is continuous. We propose an estimated QIF method and study the asymptotic properties of the proposed estimator. The proposed method inherits the merit of QIF method which avoids the estimation of nuisance correlation parameters and is computationally easy to implement. Under certain regularity conditions, we establish the asymptotic normality of resulting estimator. Simulation studies show that our proposed method can improve the estimation efficiency compared with that ignoring dependent structure, such as the method by Liu et al. (2010). In addition, we study the problem of hypothesis testing, propose a proper test statistic which have a chi-squared limiting distribution under the null hypothesis.

The rest of the article is organized as follows. In Sect. 2, we introduce the model and describe the proposed estimation procedure. In Sect. 3, the large-sample properties of the proposed estimator are presented. In Sect. 4, a chi-squared test is developed for hypothesis testing. In Sect. 5, the finite-sample performance of the proposed procedures is assessed through extensive simulation studies. We illustrate the proposed method through analysis of a real data set from SOLVD study in Sect. 6. Some concluding remarks are given in Sect. 7 and the technical proofs are provided in “Appendix”.

2 Model and estimation

2.1 Preliminaries

Suppose that the whole cohort consists of n independent clusters, and each cluster contains K correlated failure types. Let (ik) denote the kth \((k=1,\cdots , K)\) subject in the ith \((i=1,\cdots ,n)\) cluster. Let \(\widetilde{T}_{ik}\) and \(C_{ik}\) be potential failure time and censoring time for subject (ik). With censoring, one observes \(T_{ik} = \mathrm{min}(\widetilde{T}_{ik}, C_{ik})\) and \(\Delta _{ik} = I(\widetilde{T}_{ik} \le C_{ik})\), where \(I(\cdot )\) is the indicator function. Let \(\widetilde{Z}_{ik}(t)\) be a p-vector of possibly time-dependent covariates.

For subject (ik), the hazard function \(\lambda _{ik}(t;\widetilde{Z}_{ik}(t))\) takes the following form:

$$\begin{aligned} \lambda _{ik}(t;\widetilde{Z}_{ik}(t))=\lambda _{0k}(t) \exp \{\beta ^{\mathrm{T}}\widetilde{Z}_{ik}(t)\}, \end{aligned}$$
(2.1)

where \(\beta \) is a p-vector of unknown regression parameters and \(\lambda _{0k}(t)\) is an unspecified marginal baseline hazard function pertaining to the kth failure type.

Note that model (2.1) includes as a special case the failure-type-specific model (Wei et al. 1989; Greene and Cai 2004) \(\lambda _{ik}(t;\widetilde{Z}_{ik}(t)) =\lambda _{0k}(t)\exp \{\beta _k^{\mathrm{T}}\widetilde{Z}_{ik}(t)\}\), which allows for different covariate effect for different k. This can be seen by defining \(\beta =(\beta _1^{\mathrm{T}},\ldots , \beta _k^{\mathrm{T}},\ldots ,\beta _K^{\mathrm{T}})^{\mathrm{T}}\) and \(\widetilde{Z}_{ik}^{*}(t)=(\mathbf{0}^{\mathrm{T}}, \ldots , \widetilde{Z}_{ik}^{\mathrm{T}}(t),\ldots , \mathbf{0}^{\mathrm{T}} )^{\mathrm{T}}\) in the model \(\lambda _{ik}(t;\widetilde{Z}_{ik}^{*}(t)) =\lambda _{0k}(t)\exp \{\beta ^{\mathrm{T}}\widetilde{Z}_{ik}^{*}(t)\}\). For simplicity, we write \(\lambda _{ik}(t;\widetilde{Z}_{ik}(t))\) as \(\lambda _{ik}(t)\) in the following.

Let \(\Lambda _{0k}(t) = \int _{0}^{t} \lambda _{0k}(u) \mathrm{d}u\) be the marginal cumulative baseline hazard function for the kth failure type. Let \(N_{ik}(t)= \Delta _{ik} I(T_{ik}\le t)\) and \(Y_{ik}(t) =I(T_{ik} \ge t)\) be the observed counting process and the at-risk indicator process. For convenience, write the relative risk function as \(r_{ik}(t; \beta ) = \exp \{\beta ^{\mathrm{T}}\widetilde{Z}_{ik} (t)\}\). Let \(M_{ik}(t)=N_{ik}(t)-\int _{0}^{t}Y_{ik}(u) r_{ik}(u; \beta _0)\lambda _{0k}(u)\mathrm{d}u\) be the marginal martingale process, where \(\beta _0\) is the true parameter. Given \(\beta \), \(\Lambda _{0k}(t)\) can be estimated consistently by the following Breslow type estimator (Breslow 1972):

$$\begin{aligned} \widetilde{\Lambda }_{0k}(t;\beta ) = \int _{0}^{t} \frac{\sum _{i=1}^{n} \mathrm{d}N_{ik}(u)}{\sum _{i=1}^{n} Y_{ik}(u) r_{ik}( u;\beta )}. \end{aligned}$$
(2.2)

Given \(\beta _0\), it follows that \( M_{ik}(t)\) could be estimated as follows:

$$\begin{aligned} \widetilde{M}_{ik}(t;\beta _0)=N_{ik}(t)-\int _{0}^{t}Y_{ik}(u) r_{ik}(u;\beta _0) \widetilde{\Lambda }_{0k}(\mathrm{d}t;\beta _0). \end{aligned}$$

Let \(\tau \) be the end time of study. Write \(\widetilde{\varvec{M}}_i(t;\beta ) =(\widetilde{M}_{i1} (t;\beta ), \ldots , \widetilde{M}_{iK}(t;\beta ))^{\mathrm{T}}\). To improve the estimation efficiency, Cai and Prentice (1995) added proper weight matrix to the pseudo-partial likelihood equation, and proposed to obtain estimate of \(\beta \) through solving the following equation:

$$\begin{aligned} \sum _{i=1}^{n} \int _{0}^{\tau } \varvec{R}_i^{\mathrm{T}} (t;\beta ) W_i(t;\beta ) \widetilde{\varvec{M}}_i(\mathrm{d} t; \beta ) = 0, \end{aligned}$$
(2.3)

where \(W_i(t;\beta )=\widehat{corr}^{-1} (\varvec{M}_i({\varvec{T}}_i; \beta ))\) is the weight matrix with

$$\begin{aligned} \varvec{M}_i({\varvec{T}}_i; \beta ) = (M_{i1}(T_{i1}; \beta ), \ldots , M_{iK}(T_{iK}; \beta ))^{\mathrm{T}}, \end{aligned}$$

and

$$\begin{aligned} \varvec{R}_i^{\mathrm{T}}(t;\beta )=\left( \frac{ r_{i1}^{(1)} (t;\beta )}{ r_{i1}(t;\beta )}, \ldots , \frac{ r_{iK}^{(1)} (t;\beta )}{ r_{iK}(t;\beta )}\right) , \end{aligned}$$

with \(r^{(j)}(t;\beta )\) denotes the jth derivative of \(r(t;\beta )\) with respect to \(\beta \). The weight matrix measures the intra-cluster correlation and is important to improve estimation efficiency. However, when the cluster size K is large, the estimation of weight matrix is computationally expensive. To overcome this shortcoming, Xue et al. (2010) proposed a QIF method which is based on the following generalized estimating equation:

$$\begin{aligned} \sum _{i=1}^{n} \int _{0}^{\tau } \varvec{R}_i^{\mathrm{T}}(t;\beta ) \Xi _i^{1/2}(t; \beta ) {\varvec{\Sigma }}_i^{-1}(\alpha ) \Xi _i^{-1/2}(t; \beta ) \widetilde{\varvec{M}}_i(\mathrm{d} t; \beta )=0, \end{aligned}$$
(2.4)

where \(\Xi _i(t; \beta )= \mathrm{diag}\{\lambda _{i1}(t), \ldots , \lambda _{iK}(t)\}\) and \(\varvec{\Sigma }_i(\alpha )\) is the working correlation matrix whose common structure is specified by a vector of nuisance correlation parameters \(\alpha \). The inverse of the working correlation is approximated by a linear combination of several pre-specified symmetric basis matrices, namely,

$$\begin{aligned} \Sigma ^{-1} \approx a_1B_1 + \ldots + a_LB_L, \end{aligned}$$
(2.5)

where \(B_1, \dots , B_L\) are known basis matrices and \(a_1, \dots , a_L \) are unknown coefficients.

Substituting (2.5) in (2.4) leads to a linear combination of the elements of the following vector

$$\begin{aligned} \varvec{G}_n(\beta )&= \frac{1}{n} \sum _{i=1}^{n} \varvec{g}_i(\beta )\\&=\frac{1}{n} \sum _{i=1}^{n} \left( \begin{array}{c} \int _{0}^{\tau } \varvec{R}_i^{\mathrm{T}}(t;\beta ) \Xi _i^{1/2}(t; \beta ) B_1 \Xi _i^{-1/2}(t; \beta ) \widetilde{\varvec{M}}_i(\mathrm{d} t; \beta )\\ \vdots \\ \int _{0}^{\tau } \varvec{R}_i^{\mathrm{T}}(t;\beta ) \Xi _i^{1/2}(t; \beta ) B_L \Xi _i^{-1/2}(t; \beta ) \widetilde{\varvec{M}}_i(\mathrm{d}t; \beta ) \end{array}\right) . \end{aligned}$$

As there are more equations than unknown parameters, Xue et al. (2010) proposed to estimate \(\beta \) by minimizing the following QIF:

$$\begin{aligned} \varvec{Q}_n(\beta ) = \varvec{G}_n^{\mathrm{T}}(\beta ) \varvec{W}_n^{-1}(\beta ) \varvec{G}_n(\beta ), \end{aligned}$$
(2.6)

where \(\varvec{W}_n(\beta ) = \frac{1}{n^2} \sum _{i=1}^{n} \varvec{g}_i(\beta ) \varvec{g}_i^{\mathrm{T}}(\beta )\). We denote the solution as \(\widetilde{\beta }_Q\) in the following.

In the implementation of QIF, there being an additional issue that the diagonal matrix \(\Xi _i(t; \beta )\) involves the unknown baseline hazard function \(\lambda _{0k}(t; \beta )\). Xue et al. (2010) suggested a kernel smoothed estimator \(\widetilde{\lambda }_{0k}(t; \beta )\) as follows,

$$\begin{aligned} \widetilde{\lambda }_{0k}(t; \beta ) = \frac{1}{\nu _k}\sum _{i=1}^{n} \kappa \left( \frac{t-T_{ik}}{\nu _k} \right) \Delta \widetilde{\Lambda }_{0k}(T_{ik}; \beta ), \end{aligned}$$
(2.7)

where \(\kappa (\cdot )\) is the Epanechnikov kernel function with \(\nu _k\) being the rule-of-thumb bandwidth, and \(\Delta \widetilde{\Lambda }_{0k}(t; \beta )=\widetilde{\Lambda }_{0k}(t; \beta )-\widetilde{\Lambda }_{0k}(t-; \beta )\) with \(\widetilde{\Lambda }_{0k}(t; \beta )\) being the Breslow estimator given in (2.2).

2.2 Estimated QIF for marginal hazards model

Consider the situation that the primary covariate can only be ascertained in validation set. Let \(\widetilde{Z}_{ik}(t)\) consist of two parts, \(X_{ik}(t)\) and \(Z_{ik}(t)\), where \(X_{ik}(t)\) is the primary variable which can only be observed in the validation set and \(Z_{ik}(t)\) is the vector of the remaining covariates that are measured precisely for the full cohort. Accordingly, write the true parameter as \(\beta _0=(\beta _{10}, \beta _{20})\) with \(\beta _{10}\) and \(\beta _{20}\) pertaining to \(X_{ik}(t)\) and \(Z_{ik}(t)\), respectively. Denote A(t) as a time-dependent auxiliary variable for the primary covariate X(t). \(A(\cdot )\) can be measured for all cohort members. Suppose that A provides no additional information to model given X, i.e.,

$$\begin{aligned} \lambda (t; X(t), Z(t), A(t)) \equiv \lambda (t; X(t), Z(t)). \end{aligned}$$

Use \(\eta _{ik} = 1\) or 0 to indicate whether the subject (ik) is in the validation set or not. Let \(V_k = \{i : \eta _{ik} = 1\}\) and \(\overline{V}_k = \{i : \eta _{ik} = 0\}\) denote the kth marginal validation set and non-validation set, respectively. Then the observed data are:

$$\begin{aligned} \begin{array}{rl} \{T_{ik}, \Delta _{ik}, \eta _{ik}, Y_{ik}(t), X_{ik}(t), Z_{ik}(t), A_{ik}(t)\} &{} \quad \text{ if } i\in V_k\\ \{T_{ik}, \Delta _{ik}, \eta _{ik}, Y_{ik}(t), Z_{ik}(t), A_{ik}(t)\} &{} \quad \text{ if } i \in \overline{V}_k. \end{array} \end{aligned}$$

According to Liu et al. (2009), when subject (ik) is in non-validation set, the hazard function given observed data can be written as:

$$\begin{aligned} \lambda _{ik}(t; Z_{ik}(t), A_{ik}(t))&= \lambda _{0k}(t) e^{\beta _2^{\mathrm{T}} Z_{ik}(t)}E\{e^{\beta _1^{\mathrm{T}}X_{ik}(t)}| Y_{ik}(t)=1, A_{ik}(t), Z_{ik}(t)\}\\&= \lambda _{0k}(t) e^{\beta _2^{\mathrm{T}}Z_{ik}(t)}E\{e^{\beta _1^{\mathrm{T}}X_{ik}(t)}| Y_{ik}(t)=1, A_{ik}^{*}(t)\}, \end{aligned}$$

where \(A_{ik}^{*}(t)\) denotes all the possible auxiliary information, which may include the auxiliary covariate \(A_{ik}(t)\) and the part from \(Z_{ik}(t)\). Therefore, the induced relative risk function is

$$\begin{aligned} r_{ik}(t;\beta ) = \varphi _{ik}( t;\beta )\eta _{ik} +\psi _{ik}(t;\beta )(1-\eta _{ik}), \end{aligned}$$

where \(\varphi _{ik}(t;\beta ) = \exp \{\beta _1^{\mathrm{T}} X_{ik}(t)+\beta _2^{\mathrm{T}}Z_{ik}(t)\}\), and

$$\begin{aligned} \psi _{ik}(t;\beta ) = \exp \{\beta _2^{\mathrm{T}}Z_{ik}(t)\} E\{e^{\beta _1^{\mathrm{T}}X_{ik}(t)}| Y_{ik}(t)=1, A_{ik}^{*}(t)\}. \end{aligned}$$

If the conditional density of \(X_{ik}\), written as \(f( x|T_{ik}\ge t, A_{ik}^*)\), is a known function up to a parameter \(\theta \), then \((\beta ,\theta )\) can be estimated by using the induced risk function to replace risk function in equations (2.3) or (2.4). However, misspecification of such parameterization may lead to biased estimates. We use empirical method to estimate \(\psi _{ik}(t;\beta )\) and then replace it with the corresponding estimate.

In this paper, we consider the often encountered case that both the primary covariate \(X_{ik}(t)\) and the auxiliary variable \(A_{ik}^*(t)\) are one-dimensional. The unknown part of induced relative risk function in non-validation set is estimated by kernel smoothing method

$$\begin{aligned} \widehat{\psi }_{ik}(t; \beta ) = \frac{\sum _{j \in V_k}Y_{jk}(t) \Psi \{\mu _k^{-1}(A_{jk}^{*}(t)-A_{ik}^{*}(t))\} e^{\beta _1 X_{jk}(t)}}{\sum _{j \in V_k}Y_{jk}(t) \Psi \{\mu _k^{-1}(A_{jk}^{*}(t)-A_{ik}^{*}(t))\}} e^{\beta _2^{\mathrm{T}}Z_{ik}(t)}, \end{aligned}$$
(2.8)

where \(\Psi (\cdot )\) is a kernel function, \(\mu _k\) is the bandwidth. Imputation of the relative risk by interpolation would be used when the denominator is 0. Therefore, the estimate of the relative risk is

$$\begin{aligned} \widehat{r}_{ik}(t; \beta ) = \varphi _{ik}(t; \beta )\eta _{ik} +\widehat{\psi }_{ik}(t; \beta )(1-\eta _{ik}). \end{aligned}$$

Replacing \( r_{ik}(t; \beta )\) by \(\widehat{r}_{ik}(t; \beta )\) in the notations in Sect. 2.1, we obtain an estimated version of \(\varvec{R}_i, \widetilde{\Lambda }_{0k}, \lambda _{0k}, \Xi _i, \varvec{\widetilde{M}}_i, \varvec{g}_i\) and \(\varvec{G}_n\). To differentiate, write as \(\widehat{\varvec{R}}_i, \widehat{\Lambda }_{0k}, \widehat{\lambda }_{0k}, \widehat{\Xi }_i\), \(\varvec{\widehat{M}}_i, \widehat{\varvec{g}}_i\) and \(\varvec{\widehat{\varvec{G}}}_n\). It yields an estimated QIF as

$$\begin{aligned} \widehat{\varvec{Q}}_n(\beta ) = \widehat{\varvec{G}}_n^{\mathrm{T}}(\beta ) \widehat{\varvec{W}}_n^{-1}(\beta ) \widehat{\varvec{G}}_n(\beta ), \end{aligned}$$

where \(\widehat{\varvec{W}}_n(\beta ) = \frac{1}{n^2} \sum _{i=1}^{n} \widehat{\varvec{g}}_i(\beta ) \widehat{\varvec{g}}_i^{\mathrm{T}}(\beta )\). \(\beta _0\) can be estimated by minimizing \(\widehat{Q}_n(\beta )\), i.e.,

$$\begin{aligned} \widehat{\beta }_Q = \underset{\beta }{\mathrm{argmin}} \widehat{\varvec{Q}}_n(\beta ). \end{aligned}$$
(2.9)

To reduce the computation burden, we approximate the first and the second order derivatives of \(\widehat{Q}_n(\beta )\) as in Qu et al. (2000) as follows.

$$\begin{aligned} \widehat{\varvec{Q}}_n^{(1)} (\beta )= & {} 2 \{\widehat{\varvec{G}}_n^{(1)}(\beta )\}^{\mathrm{T}} \widehat{\varvec{W}}_n^{-1} (\beta ) \widehat{\varvec{G}}_n(\beta )+o_p(1),\\ \widehat{\varvec{Q}}_n^{(2)} (\beta )= & {} 2\{\widehat{\varvec{G}}_n^{(1)}(\beta )\}^{\mathrm{T}} \widehat{\varvec{W}}_n^{-1} (\beta ) \widehat{\varvec{G}}_n^{(1)}(\beta )+o_p(1). \end{aligned}$$

Then, Newton-Raphson algorithm can be applied by using the approximation.

3 Asymptotic properties

In this section, we present the asymptotic properties of the proposed estimated QIF estimator \(\widehat{\beta }_Q\), and provide standard error formula for it.

Let \(n_k\) denote the number of subjects in \({V}_k\) and assume \(n_k/n \rightarrow \rho _k\) as \(n \rightarrow \infty \), where \(\rho _k\) represents the probability of subject (ik) being sampled into the kth marginal validation set. Under the conditions listed in “Appendix”, we demonstrate the asymptotic behavior of \(\widehat{\beta }_Q\) in the following theorems.

Theorem 1

Under conditions (C1)–(C9) in “Appendix”, the following results hold:

  1. (I)

    The proposed estimator \(\widehat{\beta }_Q\) is a consistent estimator of \(\beta _0\).

  2. (II)

    \(\sqrt{n}(\widehat{\beta }_Q - \beta _0)\) is asymptotically normally distributed with mean zero and variance matrix \(\Sigma _Q(\beta _0)= (J_0^{\mathrm{T}} \varvec{W}_0^{-1}J_0)^{-1}\), where \(J_0 = J(\beta _0)\), with

    $$\begin{aligned} J(\beta _0) = \left( \begin{array}{c} \Gamma (\beta _0, B_1)\\ \vdots \\ \Gamma (\beta _0, B_L) \end{array}\right) , \end{aligned}$$

    and \(\Gamma (\beta _0, B)\) and \(\varvec{W}_0\) are defined in (A.1) and (A.2) in “Appendix”, respectively.

The asymptotic covariance \(\Sigma _Q(\beta _0)\) can be consistently estimated by

$$\begin{aligned} \widehat{\Sigma }_Q(\widehat{\beta }_Q) = \{ \widehat{J}^{\mathrm{T}} (\widehat{\beta }_Q) \widehat{\varvec{W}}^{-1} \widehat{J}(\widehat{\beta }_Q) \}^{-1}, \end{aligned}$$

where

$$\begin{aligned} \widehat{J}(\beta ) = \left( \begin{array}{c} \widehat{\Gamma }(\beta , B_1)\\ \vdots \\ \widehat{\Gamma }(\beta , B_L) \end{array}\right) , \end{aligned}$$

with

$$\begin{aligned} \widehat{\Gamma }(\beta , B)&= \frac{1}{n} \sum _{i=1}^{n} \sum _{k=1}^{K} \Delta _{ik} \left( \sum _{m=1}^{K} \frac{\partial \widehat{R}_{im}(T_{ik}; \beta ) \widehat{\phi }_{imk}(T_{ik};\beta , B)}{\partial \beta ^{\mathrm{T}}} -\frac{\widehat{S}_k^{3}(T_{ik}; \beta , B)}{\widehat{S}_k^{0}(T_{ik}; \beta )} \right. \\&\quad \left. -\frac{\widehat{S}_k^{4}(T_{ik}; \beta , B) +\widehat{S}_k^{5}(T_{ik}; \beta , B)}{\widehat{S}_k^{0}(T_{ik}; \beta )} +\frac{\widehat{S}_k^{2}(T_{ik}; \beta , B)(\widehat{S}_k^{1}(T_{ik}; \beta ))^{\mathrm{T}}}{(\widehat{S}_k^{0}(T_{ik}; \beta ))^2} \right) , \end{aligned}$$

and

$$\begin{aligned} \widehat{\varvec{W}} = \{\widehat{\varvec{\omega }} (\widehat{\beta }_Q, B_j, B_{j^{'}})\}_{j, j^{'} =1}^{L}, \end{aligned}$$

where

$$\begin{aligned} \widehat{\varvec{\omega }}(\beta , B_j, B_{j^{'}}) = n^{-1} \sum _{i=1}^{n} \sum _{k=1}^{K} \sum _{m=1}^{K} \widehat{U}_{ik} (\beta , B_j) \widehat{U}_{im}^{\mathrm{T}}(\beta , B_{j^{'}}), \end{aligned}$$

with

$$\begin{aligned} \widehat{U}_{ik}(\beta , B)&=\Delta _{ik}\widehat{\varvec{\zeta }}_{ik} (T_{ik};\beta , B)-\sum _{l=1}^{n} \Delta _{lk} \widehat{\varvec{\zeta }}_{ik}(T_{lk};\beta , B) \frac{Y_{ik}(T_{lk}) \widehat{r}_{ik}(T_{lk}; \beta )}{n\widehat{S}_k^{0}(T_{lk}; \beta )} \\&\quad - \frac{n-n_k}{n_k} \sum _{l=1}^{n} \Delta _{lk} \widehat{\varvec{\zeta }}_{ik}(T_{lk};\beta , B) \frac{Y_{ik}(T_{lk}) (\widehat{r}_{ik}(T_{lk}; \beta ) - \widehat{\psi }_{ik}(T_{lk}; \beta ))}{n \widehat{S}_k^{0}(T_{lk}; \beta )} , \end{aligned}$$

where \(\widehat{S}_k^{d}\ (d=0,\ldots ,5)\) and \(\widehat{\varvec{\zeta }}_{ik} (t;\beta , B)\) are defined in “Appendix”.

Remark 1

As a special case of the estimated QIF estimator, the estimator using the independent working correlation is denoted as \(\widehat{\beta }_I\), which is the same as the EPPL estimator of Liu et al. (2010), but different expressions of the variance matrix \(\Sigma _I(\beta _0)\) of the asymptotic distribution of \(\sqrt{n}(\widehat{\beta }_I - \beta _0)\) and its estimator \(\widehat{\Sigma }_I(\widehat{\beta }_I)\) are provided under the conditions (C1)–(C8) in “Appendix”. From the corresponding expressions of \(\Sigma _Q(\beta _0)\) and \(\widehat{\Sigma }_Q(\widehat{\beta }_Q)\), we can obtain that

$$\begin{aligned} \Sigma _I(\beta _0) =\Gamma ^{-1}(\beta _0, I) \varvec{\omega }(\beta _0, I, I) \Gamma ^{-1}(\beta _0, I), \end{aligned}$$

and

$$\begin{aligned} \widehat{\Sigma }_I(\widehat{\beta }_I)=\widehat{\Gamma }^{-1}(\widehat{\beta }_I, I) \widehat{\varvec{\omega }} (\widehat{\beta }_I, I, I) \widehat{\Gamma }^{-1}(\widehat{\beta }_I, I). \end{aligned}$$

4 Inference on hazard ratio parameters

The QIF is built on an objective function, which provides a natural way to make inference about the hazard ratio parameter \(\beta \). Suppose that \(\beta \) is partitioned into \(\gamma \) and \(\delta \), where \(\gamma \) is vector of hazard ratio parameters of interest with dimension \(p_0\), and \(\delta \) is a vector of nuisance parameters with dimension \(p-p_0\). As a special case, we also allow \(p_0=p\), with \(\beta = \gamma \) and \(\delta \) being absent.

To test

$$\begin{aligned} H_0: \gamma =\gamma _0 \quad \text{ versus } \quad H_1: \gamma \ne \gamma _0, \end{aligned}$$

we propose a test statistic

$$\begin{aligned} T=\widehat{\varvec{Q}}_n(\gamma _0, \widetilde{\delta }) -\widehat{\varvec{Q}}_n(\widehat{\gamma }, \widehat{\delta } ), \end{aligned}$$
(4.1)

where

$$\begin{aligned} \widetilde{\delta } = \underset{\delta }{\mathrm{argmin}} \widehat{\varvec{Q}}_n(\gamma _0, \delta ), \quad (\widehat{\gamma }, \widehat{\delta }) = \underset{(\gamma , \delta )}{\mathrm{argmin}} \widehat{\varvec{Q}}_n(\gamma , \delta ). \end{aligned}$$

The values of \(\widehat{\varvec{Q}}_n(\gamma _0, \widetilde{\delta })\) and \(\widehat{\varvec{Q}}_n(\widehat{\gamma }, \widehat{\delta } )\) measure how well the model fits the data under \(H_0\) and \(H_1\), respectively. Under \(H_0\), the difference between \(\widehat{\varvec{Q}}_n(\gamma _0, \widetilde{\delta })\) and \(\widehat{\varvec{Q}}_n(\widehat{\gamma }, \widehat{\delta })\) should be very small. However, under \(H_1\), \(\widehat{\varvec{Q}}_n(\gamma _0, \widetilde{\delta })\) should be systematically larger than \(\widehat{\varvec{Q}}_n(\widehat{\gamma }, \widehat{\delta } )\).

Theorem 2

Suppose conditions (C1)–(C9) in “Appendix” are satisfied, under \(H_0\), the test statistic T asymptotically follows chi-squared distribution with \(p_0\) degrees of freedom.

Comment

To prove Theorem 2, we rewrite that

$$\begin{aligned} \widehat{\varvec{Q}}_n(\gamma _0, \widetilde{\delta }) -\widehat{\varvec{Q}}_n(\widehat{\gamma }, \widehat{\delta })&=(\widehat{\varvec{Q}}_n(\gamma _0, \widetilde{\delta }) - \varvec{Q}_n(\gamma _0, \widetilde{\delta }))-(\widehat{\varvec{Q}}_n(\widehat{\gamma }, \widehat{\delta }) -\varvec{Q}_n(\widehat{\gamma }, \widehat{\delta })) \\&\quad + (\varvec{Q}_n(\gamma _0, \widetilde{\delta }) -\varvec{Q}_n(\gamma _0, \widetilde{\delta }^*)) -(\varvec{Q}_n(\widehat{\gamma }, \widehat{\delta }) -\varvec{Q}_n(\widehat{\gamma }^*, \widehat{\delta }^*))\\&\quad +(\varvec{Q}_n(\gamma _0, \widetilde{\delta }^*) -\varvec{Q}_n(\widehat{\gamma }^*, \widehat{\delta }^*)), \end{aligned}$$

where \(\varvec{Q}_n\) is defined as in (2.6), \(\widetilde{\delta }^* = \underset{\delta }{\mathrm{argmin}} \varvec{Q}_n(\gamma _0, \delta )\), and \((\widehat{\gamma }^*, \widehat{\delta }^*) = \underset{(\gamma , \delta )}{\mathrm{argmin}} \varvec{Q}_n(\gamma , \delta )\). From the proof of Theorem 1, we can obtain that the first two brackets equal to \(o_p(1)\). In addition, from the conclusions of the previous Theorem 1 and the Theorem 1 in Xue et al. (2010), both \(\widehat{\beta }_Q\) and \(\widetilde{\beta }_Q\) are consistent estimators of \(\beta _0\), then we can have that the third and the fourth brackets also equal to \(o_p(1)\). Furthermore, the last bracket asymptotically approaches to a random variable which follows chi-squared distribution with \(p_0\) degrees of freedom, the proof of the last one is similar to the proof of Theorem 1 in Qu et al. (2000).

5 Simulation studies

In this section, we conduct simulation studies to evaluate the finite-sample behavior of the proposed method. We first evaluate the performance of proposed estimator in Sect. 5.1 and then the performance of inference method in Sect. 5.2.

5.1 Performance of estimated QIF estimator

We compare the proposed estimator with the QIF estimator proposed by Xue et al. (2010) based only on the validation set and the EPPL estimator (\(\widehat{\beta }_I\)) of Liu et al. (2010), which utilizes the auxiliary information but does not consider the intra-cluster correlation in the estimate of \(\beta \). The proposed estimator takes both the intra-cluster correlation and the auxiliary information into account.

The covariates \((X_{i1}, X_{i2}, \ldots , X_{iK})\), which are only observed in the validation sets in the real studies, are generated independently from uniform distribution U(0, 1). The covariates \((Z_{i1}, Z_{i2}, \ldots , Z_{iK})\) are independent binary covariates taking value one with probability 0.5. The multivariate failure times \(\widetilde{T}_{i1}, \widetilde{T}_{i2}, \ldots , \widetilde{T}_{iK}\) are generated from multivariate (Clayton and Cuzick 1985) model with the joint survival function

$$\begin{aligned} S(t_1, \ldots , t_K; D_1, \ldots , D_K) = \left( \sum _{k=1}^{K} \exp (\theta ^{-1}\lambda _{0k}t_ke^{\beta _k^{\mathrm{T}} D_k}) -(K-1)\right) ^{-\theta }, \end{aligned}$$

where \(D_k=(X_k^{\mathrm{T}}, Z_k^{\mathrm{T}})^{\mathrm{T}}\), and \(\beta _k\), which may vary with the failure type, is the corresponding parameter of \(D_k\), and \(\theta \) is the dependence parameter, a larger value of which represents a weaker dependence between the failure times. We set \(\theta =0.25\), 0.5 or 2, which presents a varying degree of correlation between the generated failure times, and the baseline hazard function \(\lambda _{0k}=1\). The simulated failure times \((t_1, \ldots , t_K)\) are generated by using the algorithm described in Cai and Shen (2000) through

$$\begin{aligned} t_1&= -e^{-\beta _1^{\mathrm{T}}D_1} \ln (1-u_1), \\ t_k&= \theta e^{-\beta _k^{\mathrm{T}} D_k} \ln \left[ (k-1)-\sum _{i=1}^{k-1}a_i+\left( \sum _{i=1}^{k-1}a_i-(k-2)\right) (1-u_k)^{-(\theta + k -1)^{-1}} \right] , \end{aligned}$$

for \(k=2, \ldots , K\), where \(a_i=\exp (\theta ^{-1}t_i e^{\beta _i^{\mathrm{T}} D_i})\) for \(i=1, \dots , k-1\), and \((u_1,\ldots , u_K)\) are generated from uniform distribution over interval (0, 1). Censoring times are generated from U(0, c), where c is a selected constant to achieve a specified censoring rate.

Notice that the true correlation structure of the Clayton model is exchangeable, the working correlation in (2.9) is taken to be exchangeable, and the corresponding estimated QIF estimator is denoted as \(\widehat{\beta }_{QE}\). We calculated another estimated QIF estimator \(\widehat{\beta }_{QA}\) using the misspecified AR(1) working correlation. The corresponding resulting estimators of the QIF method based only on the validation set are denoted as \(\widetilde{\beta }_{QEV}\) and \(\widetilde{\beta }_{QAV}\), respectively.

To estimate the induced relative risk function and the baseline hazard function, we apply the Epanechnikov kernel function in (2.8) and (2.7) with bandwidths \( \mu _k = 2 \widehat{\sigma }(A_{V_k})n_k^{-1/3} \) and \( \nu _k = 1.06 \widehat{\sigma }(T_k)n^{-1/5} \), respectively, where \(\widehat{\sigma }(\cdot )\) is the sample standard deviation function, \(A_{V_k}\) is the part of auxiliary covariate \(A_{k}\) in the kth marginal validation set. We choose the nearest neighbor interpolation to estimate the induced relative function when the denominator in (2.8) is 0. In addition, it is worth noting that \(\widehat{\lambda }_{0k}(t)\) may be 0 at some locations because the Epanechnikov kernel function is of bounded support, which could make the diagonal matrix \(\widehat{\Xi }_i(t;\beta )\) not invertible. If this happens, we replace \(\widehat{\lambda }_{0k}(t)\) with the average of values at the non-zero locations.

We consider two types of simulations:

  1. (1)

    \(\beta _k\)s are the same for different failure type, i.e. \(\beta _1=\ldots =\beta _K=\beta \).

  2. (2)

    \(\beta _k\) varies across failure type.

The auxiliary covariate \(A_k\) is generated from \(X_k\) via

$$\begin{aligned} A_k = X_k + \epsilon _k, \end{aligned}$$

where \(\epsilon _k\) follows a normal distribution \(N(0,\sigma ^2)\), the positive parameter \(\sigma \) controls the strength of association between \(A_k\) and \(X_k\). Each simulation is repeated 1000 times.

5.2 Simulation study (1)

In the first simulation, we set the true parameter \(\beta =(\beta ^{(1)},\beta ^{(2)})^{\mathrm{T}}=(0.693,-0.2)^{\mathrm{T}}\), validation proportion \(\rho _k=0.5\), association parameter \(\sigma =0.1\). The number of independent clusters is \(n=200\), with \(K=4\) or 8 failure types in each cluster.

Tables 1, 2 and 3 demonstrate the simulation results for estimates of parameter \(\beta \) for each method under different censoring rates 10%, 40% and 80%. The sample mean and sample standard deviation of the 1000 estimates, the average of estimated standard errors and the coverage rate of the 95% confidence intervals for the true parameter are listed in the Est, SD, SE and CR columns, respectively. RE, the ratio of the empirical variance of \(\widehat{\beta }_I\) to that of \(\widehat{\beta }_{QE}\) or \(\widehat{\beta }_{QA}\), is the estimated relative efficiency of estimated QIF estimators relative to \(\widehat{\beta }_I\). We summarize the results as follows: (i) The estimates of all the methods are all approximately unbiased. Moreover, the estimators of the asymptotic standard errors are approximately equal to the empirical standard deviations. The corresponding 95% confidence intervals calculated by the estimated standard errors provide reasonable coverage rates. This suggests that the estimates of asymptotic standard errors for all methods work well. (ii) For each considered scenario, the estimator \(\widehat{\beta }_I\) using auxiliary information is more efficient than the estimators \(\widetilde{\beta }_{QEV}\) and \(\widetilde{\beta }_{QAV}\) using validation set only. However, \(\widehat{\beta }_I\) loses efficiency when the degree of correlation within a cluster becomes stronger. (iii) As K increases, the empirical standard deviations (SD) of all the estimators decrease. That is naturally because of the increase in the total amount of data. (iv) As \(\theta \) decreases, the efficiency gain of estimated QIF estimators relative to \(\widehat{\beta }_I\) increases. From Table 1, estimated QIF estimators are more efficient than the other estimators for all combinations of \(\theta \) and K. We also observe the same trend from Table 2. From Table 3, however, the estimated QIF estimators are less efficient than \(\widehat{\beta }_I\), although REs are very close to 1, in several cases due to the reduction of correlation when censoring rate is 80%. Furthermore, as expected, \(\widehat{\beta }_{QE}\) with correct working correlation is always more efficient than \(\widehat{\beta }_{QA}\) with misspecified working correlation. (v) The validation proportion of the incomplete covariate has effect on the values of RE, especially for the first parameter. For example, when \(K = 4 \) and 10% censoring, the REs of \(\widehat{\beta }_{QE}\) relative to \(\widehat{\beta }_I\) for \(\widehat{\beta }^{(1)}\) decrease from (3.34, 2.26, 1.21) to (2.85, 2.03, 1.18) , when validation proportion decreases from 0.5 to 0.3 (results not shown). However, when we increased n, not only the CRs but also the REs increased.

Table 1 Simulation results for common effect size across failure type: \(\beta =(\beta ^{(1)},\beta ^{(2)})^{\mathrm{T}}=(0.693, -0.2)^{\mathrm{T}}\) under the censoring rate 10%
Table 2 Simulation results for common effect size across failure type: \(\beta =(\beta ^{(1)},\beta ^{(2)})^{\mathrm{T}}=(0.693, -0.2)^{\mathrm{T}}\) under the censoring rate 40%
Table 3 Simulation results for common effect size across failure type: \(\beta =(\beta ^{(1)},\beta ^{(2)})^{\mathrm{T}}=(0.693, -0.2)^{\mathrm{T}}\) under the censoring rate 80%

5.3 Simulation study (2)

In practical studies, one may be interested in the failure-type-specific model

$$\begin{aligned} \lambda _{ik}(t;\widetilde{Z}_{ik}(t))=\lambda _{0k}(t) \exp \{\beta _k^{\mathrm{T}}\widetilde{Z}_{ik}(t)\}, \end{aligned}$$

which allows the regression parameters varying with the failure type. We simulate \(K=2\) failure types in each cluster, the true parameter \(\beta _{1}=(\beta _1^{(1)},\beta _1^{(2)})^{\mathrm{T}} =(0.693,-0.2)^{\mathrm{T}}\) and \(\beta _{2}=(\beta _2^{(1)}, \beta _2^{(2)})^{\mathrm{T}}=(0.5,-0.262)^{\mathrm{T}}\). Since the cluster size is 2, we only need to consider the exchangeable working correlation structure. Consider three settings of n and censoring rate (CE), (n, CE)\(\,=\,(300,10\%), (300,40\%), (700,80\%)\). The simulation results are shown in Table 4. From this table, we can observe similar results as in Simulation Study (1).

Table 4 Simulation results for varying effect size across failure type: \(\beta _1=(\beta _1^{(1)},\beta _1^{(2)})^{\mathrm{T}} =(0.693, -0.2)^{\mathrm{T}}, \beta _2=(\beta _2^{(1)}, \beta _2^{(2)})^{\mathrm{T}}=(0.5, -0.262)^{\mathrm{T}}\)

5.4 Performance of inference method

We also conduct simulation studies to assess the performance of the proposed chi-squared test method. The data are generated from the same model as in Simulation Study (1) with \(\beta =(\beta ^{(1)}, \beta ^{(2)})^{\mathrm{T}}=(0.693,-0.2)^{\mathrm{T}}, n=200, K=4, \theta =0.25, \rho _k=0.5, \sigma =0.1\), and censoring rate is 10%. First, we are interested in testing \(H_0: \beta ^{(1)} =0.693\) versus \(H_1: \beta ^{(1)} \ne 0.693\). Since the dimension of \(\beta ^{(1)}\) is 1, the test statistic T in (4.1) asymptotically follows \(\chi _1^2\), where \(\widetilde{\beta }^{(2)}\) is calculated by minimizing \(\varvec{\widehat{Q}}_n(0.693, \beta ^{(2)})\) with exchangeable or AR(1) working correlation. Figure 1 shows Q–Q plot based on 1000 replications. It is clear that the plots indicate proximity to the \(\chi _1^2\) distribution for both exchangeable and AR(1) working correlation. We also examine the power of the proposed test under \(H_1: \beta ^{(1)} =\beta _{*}^{(1)}\). The powers with significance level \(\alpha =0.05\) are calculated when \(\beta _{*}^{(1)}\) takes different values in [0.3, 0.693]. According to the simulation results, when \(\beta _{*}^{(1)} = 0.693\), i.e., the alternative hypothesis collapses into the null hypothesis, powers are 0.051 and 0.061 for exchangeable and AR(1) working correlation, respectively. It shows that the proposed chi-squared test gives the right level for testing. Figure 2 plots the power functions of the chi-squared test for the estimated QIF method and the QIF method with two different working correlations, and the EPPL method. We can observe that the power functions decrease rapidly as \(\beta _{*}^{(1)}\) gets closer to the true value (0.693), but the power function for exchangeable working correlation is always larger than that for AR(1) working correlation, thus the test with correct working correlation is more powerful than the one with misspecified working correlation. Nonetheless, powers of the chi-squared test for either of the estimated QIF methods are larger than those of the other two methods. In addition, the power for the EPPL method that utilizes the auxiliary information is larger than those for QIF method based on the validation set only. Similarly, we also consider the hypothesis test that \(H_0: \beta ^{(2)} = -0.2\), and \(H_1: \beta ^{(2)} \ne -0.2\), and compute the powers under \(H_1: \beta ^{(2)} = \beta _{*}^{(2)}\) when \( \beta _{*}^{(2)}\) varies in \([-0.2, 0]\). Similar results are obtained but not presented in this paper due to space limitation.

Fig. 1
figure 1

Q–Q plot for the test statistic versus \(\chi _1^2\) under \(H_0\) for 1000 replications

Fig. 2
figure 2

Power functions of the chi-squared test for the proposed method, the EPPL method and the QIF method under \(H_0: \beta ^{(1)} =0.693, H_1: \beta ^{(1)} = \beta _{*}^{(1)}\)

6 Analysis of SOLVD data

We apply the proposed method to Left Ventricular Dysfunction (SOLVD 1991) study in this section. The SOLVD study was a randomized, double-masked, placebo-controlled trial between 1986 and 1991. The trial had a three-year recruitment and a two-year follow-up. The basic inclusion criteria for the prevention trial were: age between 21 and 80 years, inclusive, no overt symptoms of congestive heart failure, and left ventricular EF less than 35%. EF is a number between 0 and 100 that measures the efficiency of the heart in ejecting blood. A total of 4228 patients with asymptomatic left ventricular dysfunction were randomly assigned to receive either enalapril or placebo at one of the 83 hospitals linked to 23 centers in the United States, Canada, and Belgium. Liu et al. (2009) and Liu et al. (2010) have analyzed this data without considering the intra-cluster correlation.

The primary clinical issues of interest are the effects of covariates on the risk of heart failure and on the first nonfatal myocardial infarction (MI) after adjusting for the confounding variables. The covariates of interest are ejection fraction, patient’s gender (SEX), which is coded 1 for male and 0 for female, treatment (TRT), which is coded 1 for enalapril and 0 for placebo, and patient’s age (AGE), which is measured in years. In the SOLVD study, the covariates SEX, TRT and AGE were recorded for almost all of the patients, but only 108 among the total of 4228 patients have their ejection fraction accurately measured using a standardized radionucleotide technique (LVEF). A related nonstandardized measure (EF) was ascertained for all the patients. Therefore, the nonstandardized measure (EF) is a surrogate measure for the standardized measure (LVEF) in this case.

In terms of the notation in the previous sections, we set \(X_{ik}=LVEF_{ik}, A_{ik}=EF_{ik}, Z_{ik}=(TRT_{ik}, SEX_{ik}, AGE_{ik})^{\mathrm{T}}\), where k denotes failure type with \(k=1\) for heart failure and \(k=2\) for nonfatal MI and i denotes the patient with \(i=1, \ldots , 4228\). Let \(\beta _k = (\beta _{1k}, \beta _{2k}^{\mathrm{T}})^{\mathrm{T}}\) be the unknown regression coefficients, we fit the following marginal hazards model to the SOLVD data:

$$\begin{aligned} \lambda _{ik}(t; X_{ik}, Z_{ik}) = \lambda _{0k}(t) \exp (\beta _{1k}X_{ik} + \beta _{2k}^{\mathrm{T}} Z_{ik}). \end{aligned}$$

Since the primary covariate LVEF is continuous and severely incomplete, we need to estimate the induced relative risk function using the validation set. Furthermore, Liu et al. (2009) found that, given EF, the LVEF is conditionally independent of the rest of the covariates, thus \(\psi _{ik}(t;\beta )\) can be estimated through (2.8) with \(A_{ik}^{*} = A_{ik}\).

Table 5 presents the data analysis results for two methods, the proposed estimated QIF method, and the EPPL method which ignores the intra-cluster correlation between the failure times. It can be seen that the parameter estimates from two methods are close but the proposed estimated QIF method have smaller standard errors. To test whether or not the covariates have significant effects on the times of heart failure and nonfatal MI, we calculate p-values from both the two-sided Z-test and the chi-squared test for the two methods. The results indicate that, at 0.05 significance level, all of the covariates are statistically significant for heart failure by the proposed method, while SEX is not significant from the EPPL method. From both methods, only TRT is significant for the risk of nonfatal MI. By the proposed method, the risk of heart failure decreases by 3.92% (95% CI [1.83%, 5.97%]) with 1% increase in LVEF, the risk increases by 2.63% (95% CI [1.43%, 3.85%]) per year increase in age, males have about 29.46% (95% CI [7.19%, 46.39%]) lower risk for heart failure than females, and enalapril reduces the risk by 33.77% (95% CI [20.52%, 44.80%]).

Table 5 SOLVD data analysis results

To illustrate the prediction of the survival probability for a subject, Fig. 3 shows the estimated survival curves of heart failure and nonfatal MI for a 69-year-old male patient with LVEF of 28% (the median of LVEF), receiving enalapril. The survival curves by the two methods are very close, but the pointwise confidence intervals from the proposed method are narrower than those from the EPPL method.

Fig. 3
figure 3

Survival curves by the proposed method (bold curve) and the EPPL method (thin curve) for heart failure and nonfatal myocardial infarction of a subject with covariates \(\hbox {LVEF}=28\%, \hbox {TRT}=1, \hbox {SEX}=1\) and \(\hbox {AGE}=69\), along with the corresponding 95% pointwise confidence intervals (dotted curves)

7 Concluding remarks

We proposed an estimated QIF approach for multivariate failure time data when the primary covariate is ascertained on a subset of full cohort but auxiliary information is available on the full cohort. For our proposed approach, we allowed the censoring times for different failures for a subject to be different. It is worth noting that in practice the censoring times for different failures for a subject are usually the same. This can be treated as a special situation of our general set up and the proposed method is applicable.

In this article, we consider the situation that auxiliary variable is continuous. The method is based on the kernel smoothing technique and therefore is nonparametric with respect to the association between the missing covariate and corresponding auxiliary. QIF method has advantage of incorporating intra-cluster correlation in the estimation procedure. Compared with other existing methods (e.g., Liu et al. 2010) where intra-cluster correlation is not considered, the proposed procedure can improve the estimation efficiency without requiring the specification of the correlation formula. Another advantage of proposed method is that it is easy to implement. In this work, we consider the situation when the dimension of continuous auxiliary covariates is low, further research is needed when the dimension is high.