Empirical likelihood based on synthetic right censored data

https://doi.org/10.1016/j.spl.2020.108962Get rights and content

Abstract

In this paper, we develop a Mean Empirical Likelihood (MeanEL) method for right censored data. This MeanEL approach is based on traditional empirical likelihood methods but uses synthetic data to construct an EL ratio statistics, which is shown to have a χ2 limiting distribution. Different simulation studies show that the MeanEL confidence intervals tend to have more accurate coverage probabilities than other existing Empirical Likelihood methods. Theoretical comparisons of different EL methods are also provided under a general framework.

Introduction

Liang et al. (2019) proposed a Mean Empirical Likelihood (MeanEL) method based on synthetic pairwise mean data. Empirical simulation results in Liang et al. (2019) showed that this MeanEL method provides better results for heavy-tail or highly-skewed distributions and for exponentially tilted likelihood. However, theoretical comparisons of MeanEL and other existing EL methods, such as Bartlett correction Empirical Likelihood (BEL) in DiCiccio et al. (1991), the adjusted empirical likelihood (AEL) in Chen et al. (2008) and extended empirical likelihood method (EEL) in Taso and Wu (2013), were not established in Liang et al. (2019). This paper will extend such MeanEL approach to right-censored data analysis. Theoretical justification on why using such synthetic data can provide better coverage probability accuracy is also discussed in this paper.

Assume that independent and identically distributed random observations T1,T2,,Tn with an unknown distribution function F(t) are subject to right censoring, so that we only observe Zi=min(Ti,Ci),ηi=I{TiCi},i=1,2,,n,where C1,C2,,Cn are censoring times with distribution G, independent of survival times T. We are interested in the estimation problem for a parameter θ=θ(F). The true parameter value θ0 is a unique solution of the equation Eg(T,θ)=0for some function g. In this paper, we focus on estimating equations having true parameter value θ0 as the unique solution, since such estimating equations will provide (asymptotically) unbiased estimates. There are many such examples of g in the literature and the solution’s existence and uniqueness are discussed therein Newey and Smith (2004). Different function g corresponds to different parameter of interests. For example, if we choose g(t,θ)=m(t)θ, then θ is the expectation of m(T), i.e. θ=E[m(T)]=m(t)dF(t). Other examples include: [1.] g(t,θ)=(tt0θ)I{t>t0} corresponding to θ being the mean residual life time at given time t0; [2.] g(t,θ)=I{t>t0}exp(θ) corresponding to θ being the cumulative hazard function at given time t0; [3.] g(t,θ)=I{tθ}t0 corresponding to θ being the quantile function at given time t0.

Based on synthetic data introduced in Liang et al. (2019), if T is observed, the pairwise mean synthetic data set can be defined as, M=g(Ti,θ)+g(Tj,θ)2:1ijn,which can also be written as M={M1(θ),M2(θ),,MN0(θ)} with N0=n(n+1)2. Based on the data set (1.3), the MeanEL ratio for θ is R(θ)=supk=1N0N0pkk=1N0pkMk(θ)=0,k=1N0pk=1,pk0,k=1,2,,N0.Under some regularity assumptions, Liang et al. (2019) proved the mean empirical log-likelihood ratio L(θ0)=2logR(θ0)(n+1)χ2(1), in dist. Therefore, the (1α) confidence interval can be constructed as I={θ:L(θ)<χα2(1)}.

However, the above approach is not readily available under censoring, since we only observe (Zi,ηi) instead of Ti and we cannot pairwise index variable ηi directly. Therefore, we need to develop a new approach to construct, under right censoring, a synthetic data set, an estimating equation and a MeanEL ratio for θ.

This paper is organized as follows. In Section 2, we will present the MeanEL methodologies for right censored data and show that the MeanEL still has a limiting χ2 distribution, which can be used to construct a MeanEL-based confidence interval. Simulation studies are presented in Section 3 and they demonstrate that MeanEL outperforms the existing methods, especially for heavy-tail distributions. Section 4 provides a real data analysis. A theoretical high-order accuracy justification of different methods are provided in Section 5.

Section snippets

Methodology for censored data

Let Z1:nZ2:nZn:n be the ordered Z-values and η[i:n] be the concomitant of the ith order statistic, that is η[i:n]=ηj if Zi:n=Zj. Let Gi=G(Zi:n), gi(θ)=g(Zi:n,θ), δi=η[i:n]. We define the pairwise mean data set as C=gi(θ)+gj(θ)2,δiδj:1i<jn.In this new data set C, only those observations satisfying δi=δj=1 can be treated as uncensored. The following equation can be easily proved, Egi(θ0)+gj(θ0)δiδj2(1Gi)(1Gj)=0,i<j.Based on this equation, the MeanEL ratio can be defined as RC(θ0)=supk=1

Simulation studies

For a given sample size n, we generate lifetime observations T1,T2,,Tn from a specific distribution F and censoring time observations C1,C2,,Cn from certain censoring distribution G. Then, based on the simulated data, we can compare the performance of IC-confidence interval (He et al., 2016), ScaledEL-confidence interval (Wang and Jing, 2001) and MeanEL-confidence intervals IA, IB proposed in the previous section.

In our simulation, the parameter of interests, θ, is the mean of T, therefore

Real data analysis

In this section, we compare our proposed methods with existing methods using the primary biliary cirrhosis(PBC) dataset, which is described in Fleming and Harrington (1991) and originates from a Mayo Clinic trial between 1974 to 1984. It contains the survival time of 312 patients and the status variable, which indicates if the patients’ survival times are censored. We use this dataset to illustrate our proposed method described in Section 2. Fig. 2 presents the 95% confidence intervals for the

Theoretical comparisons

In this section, we present a theoretical comparison of MeanEL and other EL methods. Define Ak=n1i=1ngik(θ0)αk with αk=Egik(θ0). Following Liu and Chen (2010), we assume that α2=1, then the original EL can be written as RO(θ0)=n(R1+R2+R3)2+Op(n32),and the Bartlett correction uses the corrected statistics RB(θ0)=n1bn(R1+R2+R3)2+Op(n32),where b=12α413α32, R1=A1, R2=13α3A1212A1A2 and R3=38A1A22+49α32A1256α3A12A2+13A12A314α4A13. Then the corrected statistic RB(θ0) gives second order

CRediT authorship contribution statement

Wei Liang: Conceptualization, Methodology, Simulation, Writing. Hongsheng Dai: Methodology, Development, Main contribution in editing.

Acknowledgments

The first author is supported by the National Natural Science Foundation of China, 11701484; the Fundamental Research Funds for the Central Universities in China , 20720190067.

References (13)

  • LiangW. et al.

    Mean empirical likelihood

    Comput. Statist. Data Anal.

    (2019)
  • ChenJ. et al.

    Adjusted empirical likelihood and its properties

    J. Comput. Graph. Statist.

    (2008)
  • DiCiccioT. et al.

    Empirical likelihood is Bartlett-correctable

    Ann. Statist.

    (1991)
  • FlemingT.R. et al.

    Counting Processes and Survival Analysis

    (1991)
  • HeS.Y. et al.

    Empirical likelihood for right censored lifetime data

    J. Amer. Statist. Assoc.

    (2016)
  • LeeA.J.

    U-Statistics: Theory and Practice

    (1990)
There are more references available in the full text version of this article.

Cited by (1)

View full text