Split sample empirical likelihood

https://doi.org/10.1016/j.csda.2020.106994Get rights and content

Abstract

Empirical likelihood offers a nonparametric approach to estimation and inference, which replaces the probability density-based likelihood function with a function defined by estimating equations. While this eliminates the need for a parametric specification, the restriction of numerical optimization greatly decreases the applicability of empirical likelihood for large data problems. A solution to this problem is the split sample empirical likelihood; this variant utilizes a divide and conquer approach, allowing for parallel computation of the empirical likelihood function. The results show the asymptotic distribution of the estimators and test statistics derived from the split sample empirical likelihood are the same seen in standard empirical likelihood yet have significantly decreased computational times.

Introduction

Empirical likelihood (Owen, 1988, Owen, 1990) is a data driven likelihood that does not require specification of the data generating mechanism (specifically the probability function). Under mild regularity conditions empirical likelihoods inherit the asymptotic properties of the Fisher likelihood (Wilks, 1938, Qin and Lawless, 1994), and with very few exceptions (for instance Lazar and Mykland, 1999) permit a Bartlett correction (DiCiccio et al., 1991). Furthermore, empirical likelihood can be extended to quantiles, Huber’s location M estimate (Huber, 1964), and any functional that has a Fréchet derivative. The general form and first order asymptotics of the empirical likelihood are explored by Qin and Lawless (1994).

Empirical likelihood methods have been extended to many problems such as bivariate means (Owen, 1990); constrained empirical likelihood which includes the creation of a conditional empirical likelihood, Euclidean likelihood which allows the confidence region to extend beyond the convex hull and triangular array empirical likelihood which relaxes the assumption of the data being identically distributed (Owen, 1991); regression models, correlation models, ANOVA and variance modeling (Owen, 1991, Owen, 2001); the entire class of projection pursuit models (Owen, 1992, Kolaczyk, 1994); time series modeling (Owen, 2001, Kitamura, 1997); incorporating information from multiple moments for a parameter, two sample problems with a common mean, probability measure, incomplete information problems (Qin and Lawless, 1994); ratios of parameters and logistic regression (Qin and Lawless, 1995); partially linear models (Shi and Lau, 2000); missing response problems (Qin and Zhang, 2007); finite population inference (Chen and Qin, 1993); and Bayesian settings (Grendar and Judge, 2010, Lazar, 2003).

Many modern applications involve questions of interest that lead to complex models and to atypical probabilistic distributions. In this context, the ability to relax parametric assumptions makes empirical likelihood a very promising tool. However modern applications can involve a very large number of observations and variables, leading to serious computational obstacles for empirical likelihood. The empirical likelihood cannot typically be written in a closed form; as a consequence the solution requires numerical optimization, resulting in non-trivial and time consuming computations. Even if the time consideration is ignored, there is an additional limitation: as sample sizes (and number of parameters) increase the optimization routines can quickly exceed system memory, making the parameter estimates (let alone any inferential bounds) impossible to compute.

If the computational and memory limitations of empirical likelihood could be solved, this would reintroduce a very flexible and robust method as a potential solution to many modern statistical and scientific questions of interest. One approach to large sample problems is the notion of divide and conquer, which has been used, for instance, in regression (Chen and Xie, 2014, Song and Liang, 2015) and Monte Carlo simulations (Lindsten et al., 2017). Following this idea, we introduce a variation of empirical likelihood which maintains standard properties and can use parallel computation to dramatically reduce computation time. We call this construct split sample empirical likelihood (SSEL).

The construction of the empirical likelihood (which we will refer to as the full-sample empirical likelihood), like all likelihoods, starts with a sample from some distribution. Specifically let x1,,xn be d-variate independent identically distributed observations from some cumulative distribution F. The empirical likelihood function is L(F)=i=1ndF(xi)=i=1nPr(X=xi)=i=1nui.

The empirical likelihood function is maximized by the empirical distribution function L(Fn)=i=1nn1,and the empirical likelihood ratio function R(F)=L(F)L(Fn) can be written as R(F)=i=1nnui.Suppose now we are interested in the estimation of a p×1 parameter θ. We add additional constraints in the form of rp unbiased estimating equations g(x,θ), i.e. Eg(X,θ0)=0,

which along with constraints on ui give the profile empirical likelihood ratio function RE(θ)=supui=1nnuiui0,i=1nui=1,i=1nuig(xi,θ)=0.Provided that θ is inside the convex hull of the set g(x1,θ),,g(xn,θ) a unique value of Eq. (1) exists (Owen, 1988). RE(θ) is undefined for all θ not inside the convex hull. The convex hull of the point cloud S is the set Conv(S)=i=1|S|αixi(i:0)i=1|S|αi=1.The implications of solutions only existing inside the convex hull will be further explored in Section 2.

The rest of the article is as follows. We formally define split sample empirical likelihood in Section 2, along with necessary conditions and notation. Section 3 includes all relevant lemmas, theorems and corollaries, which show that both the estimators and test statistics have the same asymptotic distributions seen in full-sample empirical likelihood. Section 4 contains a simulation study to demonstrate how to properly construct a split sample empirical likelihood, along with results indicating that SSEL has a significantly decreased computation time. Section 5 discusses some practical issues when using SSEL.

Section snippets

Split sample empirical likelihood

The core approach of SSEL is splitting a data set in order to create multiple empirical likelihood functions, with each observation (and all corresponding variables) appearing in only one subset. This notion of separating data to create multiple likelihoods follows directly from composite likelihood (Lindsay, 1988, Varin et al., 2011), but the motivation is different: instead of using multiple likelihood components due to an inability to fully express the true likelihood, we use multiple

Asymptotic properties

Given the main goal of the likelihood function is to perform inferential tests we give results pertaining to the asymptotic behavior of both the parameter estimator and the test statistic. The results are parallel to those in Qin and Lawless (1994). Lemma 1 generalizes Lemma 1 of Qin and Lawless (1994) to account for SSEL being composed of multiple standard empirical likelihood pieces.

Lemma 1

Assume EggT is positive definite with rank p, g(x(j),θ)θ is continuous in a neighborhood of the true value θ0

Simulations

The simulations are designed to demonstrate how to appropriately divide the data into the J likelihood components, and to confirm that the resultant estimates for a given data set do not differ significantly from those obtained using the full-sample empirical likelihood. The simulations also show that by optimizing each likelihood component in parallel the computation time is significantly decreased.

We generate data from a bivariate Poisson distribution, which has the following probability mass

Discussion

We have found that in terms of accuracy it is always better to be conservative with choice of J. All our simulations have indicated that there are diminishing return as J increases assuming the sample size stays constant. We have found that smaller choices of J are usually better in terms of guaranteeing convergence of algorithm and consistent results. As an example we have seen situations where J=2 is 3 times faster than empirical likelihood (8 h versus 24 h) and produces nearly identical

References (26)

  • ShiJ. et al.

    Empirical likelihood for partially linear models

    J. Multivariate Anal.

    (2000)
  • BoxG.E.P.

    Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification

    Ann. Math. Stat.

    (1954)
  • ChenJ. et al.

    Empirical likelihood estimation for finite populations and the effective usage of auxiliary information

    Biometrika

    (1993)
  • ChenX. et al.

    A split-and-conquer approach for analysis of extraordinarily large data

    Statist. Sinica

    (2014)
  • DiCiccioT.J. et al.

    Empirical likelihood is Bartlett-correctable

    Ann. Statist.

    (1991)
  • Grendar, M., Judge, G.G., 2010. Revised empirical likelihood. Technical Report 1106. CUDARE Working Paper Series....
  • HuberP.J.

    Robust estimation of a location parameter

    Ann. Math. Stat.

    (1964)
  • KitamuraY.

    Empirical likelihood methods with weakly dependent processes

    Ann. Statist.

    (1997)
  • KolaczykE.D.

    Empirical likelihood for generalized linear models

    Statist. Sinica

    (1994)
  • LazarN.A.

    BayesIan empirical likelihood

    Biometrika

    (2003)
  • LazarN.A. et al.

    Empirical likelihood in the presence of nuisance parameters

    Biometrika

    (1999)
  • LindsayB.G.

    Composite likelihood methods

    Contemp. Math.

    (1988)
  • LindstenF. et al.

    Divide-and-conquer with sequential Monte Carlo

    J. Comput. Graph. Statist.

    (2017)
  • 1

    This material was based upon work partially supported by the National Science Foundation, United States of America under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

    View full text