A class of asymptotically efficient estimators based on sample spacings

Ekström, M.; Mirakhmedov, S. M.; Jammalamadaka, S. Rao

doi:10.1007/s11749-019-00637-7

A class of asymptotically efficient estimators based on sample spacings

Original Paper
Open access
Published: 29 July 2019

Volume 29, pages 617–636, (2020)
Cite this article

Download PDF

You have full access to this open access article

TEST Aims and scope Submit manuscript

A class of asymptotically efficient estimators based on sample spacings

Download PDF

2756 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we consider general classes of estimators based on higher-order sample spacings, called the Generalized Spacings Estimators. Such classes of estimators are obtained by minimizing the Csiszár divergence between the empirical and true distributions for various convex functions, include the “maximum spacing estimators” as well as the maximum likelihood estimators (MLEs) as special cases, and are especially useful when the latter do not exist. These results generalize several earlier studies on spacings-based estimation, by utilizing non-overlapping spacings that are of an order which increases with the sample size. These estimators are shown to be consistent as well as asymptotically normal under a fairly general set of regularity conditions. When the step size and the number of spacings grow with the sample size, an asymptotically efficient class of estimators, called the “Minimum Power Divergence Estimators,” are shown to exist. Simulation studies give further support to the performance of these asymptotically efficient estimators in finite samples and compare well relative to the MLEs. Unlike the MLEs, some of these estimators are also shown to be quite robust under heavy contamination.

A class of estimators based on overlapping sample spacings

Article 27 November 2022

Some parametric tests based on sample spacings

Article 16 September 2022

Coverage probability and exact inference

Article 01 March 2018

1 Introduction

Let $\{F_{\theta }$, $\theta \in {\varTheta }\}$ be a family of absolutely continuous distribution functions on the real line and denote the corresponding densities by $\{f_{\theta }$, $\theta \in {\varTheta }\}$. For any convex function $\phi $ on the positive half real line, the quantity

$$\begin{aligned} S_\phi (\theta )=\int _{-\infty }^{\infty }\phi \left( \frac{f_{\theta }(x)}{f_{\theta _0}(x)}\right) f_{\theta _0}(x)\mathrm{d}x \end{aligned}$$

is called the $\phi $-divergence between the distributions $F_{\theta }$ and $F_{\theta _0}$. The $\phi $-divergences, introduced by Csiszár (1963) as information-type measures, have several statistical applications including estimation. Although Csiszár (1977) describes how this measure can also be used for discrete distributions, we are concerned with the case of absolutely continuous distributions in the present paper.

Let $\xi _1,\ldots ,\xi _{n-1}$ be a sequence of independent and identically distributed (i.i.d.) random variables (r.v.’s) from $F_{\theta _0}$, $\theta _0\in {\varTheta }$. The goal is to estimate the unknown parameter $\theta _0$. If $\phi (x)=-\log x$, then $S_\phi (\theta )$ is known as the Kullback–Leibler divergence between $F_{\theta }$ and $F_{\theta _0}$. In this case, a consistent estimator of this $S_\phi (\theta )$ is given by

$$\begin{aligned} \frac{1}{n-1}\sum _{i=1}^{n-1}\phi \left( \frac{f_{\theta }(\xi _i)}{f_{\theta _0}(\xi _i)}\right) =-\frac{1}{n-1}\sum _{i=1}^{n-1}\log \frac{f_{\theta }(\xi _i)}{f_{\theta _0}(\xi _i)} . \end{aligned}$$

(1)

Minimization of this statistic with respect to $\theta $ is equivalent to maximization of the log-likelihood function, $\sum _{i=1}^n\log f_{\theta }(\xi _i)$. Thus, by finding a value of $\theta \in {\varTheta }$ that minimizes (1), we obtain the well-known maximum likelihood estimator (MLE) of $\theta _0$. Note, in order to minimize the right-hand side of (1) with respect to $\theta $, we do not need to know the value of $\theta _0$. On the other hand, for convex functions other than $\phi (x)=-\log x$, a minimization of the left-hand side of (1) with respect to $\theta $ would require the knowledge of $\theta _0$, the parameter that is to be estimated. Thus, for general convex $\phi $ functions, it is not obvious how to approximate $S_\phi (\theta )$ in order to obtain a statistic that can be used for estimating the parameters.

One solution to this problem was provided by Beran (1977), who proposed that $f_{\theta _0}$ could be estimated by a suitable nonparametric estimator ${\hat{f}}$ (e.g., a kernel estimator) in the first stage, and in the second stage, the estimator of $\theta _0$ should be chosen as any parameter value $\theta \in {\varTheta }$ that minimizes the approximation

$$\begin{aligned} {{\hat{S}}}_{\phi }(\theta )=\int _{-\infty }^{\infty }{\hat{f}}(x)\phi \left( \frac{f_{\theta }(x)}{{\hat{f}}(x)}\right) \mathrm{d}x \end{aligned}$$

(2)

of $S_\phi (\theta )$. In the estimation method suggested by Beran (1977), the function $\phi (x)= \frac{1}{2}|1-\sqrt{x}|^2$ was used, and this particular case of $\phi $-divergence is known as the squared Hellinger distance. Read and Cressie (1988, p. 124) put forward a similar idea based on power divergences, and it should be noted that the family of power divergences is a subclass of the family of $\phi $-divergences. The general approach of estimating parameters by minimizing a distance (or a divergence) between a nonparametric density estimate and the model density over the parameter space has been further extended by subsequent authors, and many of these procedures combine strong robustness features with asymptotic efficiency. See Basu et al. (2011) and the references therein for details.

Here, we propose an alternate approach obtained by approximating $S_\phi (\theta )$, using the sample spacings. Let $\xi _{(1)} \le \cdots \le \xi _{(n-1)}$ denote the ordered sample of $\xi _1,\cdots ,\xi _{n-1}$, and let $\xi _{(0)}=-\,\infty $ and $\xi _{(n)}=\infty $. For an integer $m=m(n)$, sufficiently smaller than n, we put $k=n/m$. Without loss of generality, when stating asymptotic results, we may assume that $k=k(n)$ is an integer and define non-overlapping mth-order spacings as

$$\begin{aligned} D_{j,m}(\theta )=F_{\theta }(\xi _{(jm)})-F_{\theta }(\xi _{((j-1)m)}),\,\,\,j=1,\ldots ,k. \end{aligned}$$

Let

$$\begin{aligned} S_{\phi ,n}(\theta )=\frac{1}{k}\sum _{j=1}^{k}\phi (kD_{j,m}(\theta )),\,\,\,\theta \in {\varTheta }. \end{aligned}$$

(3)

In Eq. (3), the reciprocal of the argument of $\phi $ is related to a nonparametric histogram density estimator considered in Prakasa Rao (1983, Section 2.4). More precisely, $g_n(x)=(kD_{j,m}(\theta ))^{-1}$, for $F_{\theta }(\xi _{((j-1)m)})\le x<F_{\theta }(\xi _{(jm)})$, is an estimator of the density of the r.v. $F_{\theta }(\xi _1)$, i.e., of $g(x)=f_{\theta _0}(F_\theta ^{-1}(x))/f_{\theta }(F_\theta ^{-1}(x))$, where $F_\theta ^{-1}(x)=\inf \{u:F_\theta (u)\ge x\}$. When both k and m increase with the sample size, then, for large n,

$$\begin{aligned} kD_{j,m}(\theta )\approx \frac{f_\theta (\xi _{(j-1)m+\lfloor m/2\rfloor })}{f_{\theta _0}(\xi _{(j-1)m+\lfloor m/2\rfloor })},\,\,\,j=1,\ldots ,k, \end{aligned}$$

(4)

where $\lfloor m/2 \rfloor $ is the largest integer smaller than or equal to m / 2. Thus, intuitively, as $k,m\rightarrow \infty $ as $n\rightarrow \infty $, $S_{\phi ,n}(\theta )$ should converge in probability to $S_\phi (\theta )$. Furthermore, since $\phi $ is a convex function, by Jensen’s inequality, we have $S_\phi (\theta )\ge S_\phi (\theta _0)=\phi (1)$. This suggests that if the distribution $F_{\theta }$ is a smooth function in $\theta $, an argument minimizing $S_{\phi ,n}(\theta )$, $\theta \in {\varTheta }$, should be close to the true value of $\theta _0$, and hence be a reasonable estimator.

An argument $\theta ={\hat{\theta }}_{\phi ,n}$ which minimizes $S_{\phi ,n}(\theta )$, $\theta \in {\varTheta }$, will be referred to as a Generalized Spacings Estimator (GSE) of $\theta _0$. When convenient, a root of the equation $({\mathrm{d}}/{\mathrm{d}}\theta )S_{\phi ,n}(\theta )=0$ will also be referred to as a GSE.

By using different functions $\phi $ and different values of m, we get various criteria for statistical estimation. The ideas behind this proposed family of estimation methods generalize the ideas behind the maximum spacing (MSP) method, as introduced by Ranneby (1984); the same method was introduced from a different point of view by Cheng and Amin (1983). Ranneby derived the MSP method from an approximation of the Kullback–Leibler divergence between $F_{\theta }$ and $F_{\theta _0}$, i.e., $S_\phi (\theta )$ with $\phi (x)=-\log x$ and defined the MSP estimator as any parameter value in ${\varTheta }$ that minimizes (3) with $\phi (x)=-\log x$ and $m=1$. This estimator has been shown, under general conditions, to be consistent (Ranneby 1984; Ekström 1996, 1998; Shao and Hahn 1999) and asymptotically efficient (Shao and Hahn 1994; Ghosh and Jammalamadaka 2001). Based on the maximum entropy principle, Kapur and Kesavan (1992) proposed to estimate $\theta _0$ by selecting the value of $\theta \in {\varTheta }$ that minimizes (3) with $\phi (x)=x\log x$ and $m=1$. With this particular choice of $\phi $-function, $S_\phi (\theta )$ becomes the Kullback–Leibler divergence between $F_{\theta _0}$ and $F_{\theta }$ (rather than $F_{\theta }$ and $F_{\theta _0}$). We refer to Ekström (2008) for a survey of estimation methods based on spacings and the Kullback–Leibler divergence. Ekström (1997, 2001) and Ghosh and Jammalamadaka (2001) considered a subclass of the estimation methods proposed in the current paper, namely the GSEs with $m=1$. Under general regularity conditions, it turns out that this subclass produces estimators that are consistent and asymptotically normal and that the MSP estimator, which corresponds to the special case when $\phi (x)=-\log x$, has the smallest asymptotic variance in this subclass (Ekström 1997; Ghosh and Jammalamadaka 2001). Estimators based on overlapping (rather than non-overlapping) mth-order spacings was considered in Ekström (1997, 2008), where small Monte Carlo studies indicated, in an asymptotic sense, that larger orders of spacings are always better (when $\phi (x)$ is a convex function other than the negative log function). Menéndez et al. (1997, 2001a, b) and Mayoral et al. (2003) consider minimum divergence estimators based on spacings that are related to our GSEs. In the asymptotics, they use only a fixed number of spacings, and their results suggest that GSEs will not be asymptotically fully efficient when k in (3) is held fixed.

In the present paper, it is shown that GSEs are consistent and asymptotically normal under general conditions. In contrast to the aforementioned papers, we allow both the number of spacings, k, and the order of spacings, m, to increase to infinity with the sample size. We show that if both of them do tend to infinity, then there exists a class of asymptotically efficient GSEs that we call the Minimum Power Divergence Estimators (MPDEs). In contrast, if m is held fixed, then the only asymptotically optimal GSE is the one based on $\phi (x)=-\log x$. The main results are stated in the next section, followed by a simulation study assessing (i) the performance of these estimators for different m and n and (ii) the robustness of these estimators under contamination. In the latter case, we also assess a suggested data-driven rule for choosing the order of spacings, m. Detailed proofs are to be found in “Appendix” and online Supplementary Material.

2 Main results

In this section, we state the main results and the assumptions that are needed. Unless otherwise stated, it will henceforth be assumed that $\xi _1,\ldots ,\xi _{n-1}$ are i.i.d. r.v.’s from $F_{\theta _0}$, $\theta _0\in {\varTheta }$.

We will prove the consistency of the GSEs under the following assumptions:

Assumption 1

The family of distributions $\{F_\theta (\cdot )$, $\theta \in {\varTheta }\}$ has common support, with continuous densities $\{f_\theta (\cdot )$, $\theta \in {\varTheta }\}$, and $F_\theta (x)\ne F_{\theta _0}(x)$ for at least one x when $\theta \ne \theta _0$.

Assumption 2

The parameter space ${\varTheta }\subset R$ contains an open interval of which $\theta _0$ is an interior point, and $F_\theta (\cdot )$ is differentiable with respect to $\theta $.

Assumption 3

The function $\phi (t)$, $t>0$, satisfies the following conditions:

(a)
it is strictly convex;
(b)
$\min \{0,\phi (t)\}/t\rightarrow 0$ as $t\rightarrow \infty $;
(c)
it is bounded from above by $\psi (t)=c_1(t^{-c_2}+t^{c_3})$ for all $t>0$, where $c_1,c_2$, and $c_3$ are some nonnegative constants;
(d)
it is twice differentiable.

Assumption 3 is valid for a wide class of convex functions including the following,

$$\begin{aligned} \phi (x)=\phi _\lambda (x)=\left\{ \begin{array}{ll} \lambda ^{-1}(1+\lambda )^{-1}(x^{\lambda +1}-1), &{}\quad \text{ if } \,\,\lambda \ne -1,0, \\ -\log x, &{}\quad \text{ if } \,\,\lambda =-\,1,\\ x\log x, &{}\quad \text{ if } \,\,\lambda =0, \end{array}\right. \end{aligned}$$

(5)

where the cases $\lambda =-\,1$ and $\lambda =0$ are given by continuity, i.e., by noting that $\lim _{\lambda \rightarrow 0}(x^\lambda -1)/\lambda =\log x$.

Theorem 1

Under Assumptions 1–3, when $m>c_2$ is fixed, or $m\rightarrow \infty $ such that $m=o(n)$, the equation $({\mathrm{d}}/{\mathrm{d}}\theta )S_{\phi ,n}(\theta )=0$ has a root ${{\hat{\theta }}}_{\phi ,n}$ with a probability tending to 1 as $n\rightarrow \infty $, such that

$$\begin{aligned} {{\hat{\theta }}}_{\phi ,n}={\hat{\theta }}_{\phi ,n}(\xi _1,\ldots ,\xi _{n-1}){\mathop {\longrightarrow }\limits ^{p}}\theta _0. \end{aligned}$$

For the purpose of the next theorem, we will use the notation $f(x,\theta )$ for the density $f_\theta (x)$, and we denote its partial derivatives by

$$\begin{aligned} f_{ij} (x,\theta )=\frac{\partial ^{i+j} }{\partial x^{i} \partial \theta ^{j} }f(x,\theta ). \end{aligned}$$

Let $W_{1},W_{2},\ldots $ be a sequence of independent standard exponentially distributed r.v.’s, ${G}_{m}=W_1+\cdots +W_m$, and ${{\bar{G}}}_m=m^{-1}G_m$.

We then have the following important result:

Theorem 2

Let $m=o(n)$. In addition to Assumptions 1 and 2, assume the following conditions:

(i)
The function $\phi $ is a strictly convex function and thrice continuously differentiable.
(ii)
The quantities $\mathrm{Var}(W_{1} \phi '({\bar{G}}_{m} ))$, $E(W_{1}^{2} \phi ''({\bar{G}}_{m} ))$, and $E(W_{1}^{3} \phi '''({\bar{G}}_{m} ))$ exist and are bounded away from zero.
(iii)
The density function $f(x,\theta )$, the inverse $F_{\theta }^{-1} (x)$, and the partial derivatives $f_{10}$ and $f_{11} $ are continuous in x at $\theta =\theta _{0}$, and $f_{01}$, $f_{02}$, and $f_{03}$ are continuous in x and $\theta $ in an open neighborhood of $\theta _0$.
(iv)
The Fisher information,
$$\begin{aligned} I(\theta )=\int _{-\infty }^{\infty }\left[ \frac{f_{01}(x,\theta )}{f(x,\theta )} \right] ^{2} f(x,\theta )\mathrm{d}x, \end{aligned}$$
takes values in the interval $(0,\infty )$ for $\theta $ in a neighborhood of $\theta _0$.

Then, for any consistent root $\hat{\theta }_{\phi ,n} $ of $({\mathrm{d}}/{\mathrm{d}}\theta )S_{\phi ,n}(\theta )=0$, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _x\left| P\left( \sqrt{n}({{\hat{\theta }}}_{\phi ,n}-\theta _0)\le x\right) -{\varPhi }\left( x\sqrt{\frac{I(\theta _{0})}{e_m(\phi )}}\right) \right| =0, \end{aligned}$$

where ${\varPhi }$ is the standard normal cumulative distribution function,

$$\begin{aligned} e_{m} (\phi )= & {} \sigma _{\phi }^{2} \left( E\left( {\bar{G}}_{m}^{2} \phi ''({\bar{G}}_{m} )\right) \right) ^{-2},\nonumber \\ \sigma _{\phi }^{2}= & {} m\mathrm{Var}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) +(2m+1)\mu _{m}^{2} -2m\mu _{m} E\left( {\bar{G}}_{m} ^{2} \phi '({\bar{G}}_{m} )\right) , \end{aligned}$$

(6)

and

$$\begin{aligned} \mu _{m}=E\left( W_{1} \phi '({\bar{G}}_{m} )\right) =E\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) . \end{aligned}$$

The theorems will be proved using proof methods related to those of, e.g., Lehmann and Casella (1998) for the MLE. A generalization to the multiparameter case is possible, much like in the case of maximum likelihood estimation. However, as in Lehmann and Casella (1998) for the MLE, this will require somewhat more complex assumptions and proofs and we refrain from attempting it here.

For the case $m=1$ and by assuming that $\lim _{x\rightarrow 0}\phi '(x)x^2\mathrm{e}^{-x}=\lim _{x\rightarrow \infty }\phi '(x)x^2\mathrm{e}^{-x}=0$, Ekström (1997) and Ghosh and Jammalamadaka (2001) showed that $e_{m} (\phi )\ge 1$, with equality if and only if $\phi (x)=a\log x+bx+c$, for some constants a, b, and c. That this inequality holds true for general m can be seen by integrating by parts, i.e., assuming that $\lim _{x\rightarrow 0}\phi '(x/m)x^{m+1}\mathrm{e}^{-x}=\lim _{x\rightarrow \infty }\phi '(x/m)x^{m+1}\mathrm{e}^{-x}=0$, we get

$$\begin{aligned} \left( E({\bar{G}}_{m}^{2} \phi ''({\bar{G}}_{m} ))\right) ^{2}= & {} \left( mE({\bar{G}}_{m}^{2} \phi '({\bar{G}}_{m} ))-(m+1)\mu _{m} \right) ^{2}\\= & {} \left( m\, \mathrm{Cov}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} ),{\bar{G}}_{m} \right) -\mu _{m} \right) ^{2} \\= & {} m^{2} \left( \mathrm{Cov}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} ),{\bar{G}}_{m} \right) \right) ^2\\&\quad +\,(2m+1)\mu _{m}^{2}-2m\mu _{m} E({\bar{G}}_{m}^{2} \phi '({\bar{G}}_{m} )) \le \sigma _{\phi }^{2} , \end{aligned}$$

where the inequality on the right-hand side follows by noting that

$$\begin{aligned} \left( \mathrm{Cov}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} ),{\bar{G}}_{m} \right) \right) ^2\le \mathrm{Var}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) \mathrm{Var}({\bar{G}}_{m}) \end{aligned}$$

and $\mathrm{Var}({\bar{G}}_{m}) =m^{-1} $. Hence, $e_{m} (\phi )=\sigma _{\phi }^{2} \left( E({\bar{G}}_{m}^{2} \phi ''({\bar{G}}_{m} ))\right) ^{-2} \ge 1$, with equality if and only if $x\phi '(x)=a+bx$ or, equivalently, if and only if $\phi (x)=a\log x+bx+c$, where $a<0$. It should be observed that the corresponding estimator ${{\hat{\theta }}}_{\phi ,n}$ does not depend on the chosen values of $a<0$, b, and c. Thus, without loss of generality, we may choose $a=-\,1$ and $b=c=0$, i.e., for m fixed, the asymptotically optimal estimator ${{\hat{\theta }}}_{\phi ,n}$ is based on the function $\phi (x)=-\log x$. If, however, we let $m\rightarrow \infty $, then the asymptotically optimal estimator is no longer unique. For example, let us consider the family of power divergences (Read and Cressie 1988; Pardo 2006) given by

$$\begin{aligned} T_{\lambda }(\theta )= & {} \frac{1}{\lambda (\lambda +1)}\int _{-\infty }^\infty \left( \left( \frac{f_\theta (x)}{f_{\theta _0}(x)}\right) ^{\lambda +1}-1\right) f_{\theta _0}(x)\mathrm{d}x, \end{aligned}$$

where the cases $\lambda =-\,1$ and $\lambda =0$ are given by continuity, i.e.,

$$\begin{aligned} T_{-1}(\theta )=\lim _{\lambda \rightarrow -1}T_{\lambda }(\theta )= \int _{-\infty }^{\infty }\left( -\log \left( \frac{f_\theta (x)}{f_{\theta _0}(x)}\right) \right) f_{\theta _0}(x)\mathrm{d}x \end{aligned}$$

and

$$\begin{aligned} T_{0}(\theta )=\lim _{\lambda \rightarrow 0}T_{\lambda }(\theta )= \int _{-\infty }^{\infty }\left( \frac{f_\theta (x)}{f_{\theta _0}(x)}\log \left( \frac{f_\theta (x)}{f_{\theta _0}(x)}\right) \right) f_{\theta _0}(x)\mathrm{d}x, \end{aligned}$$

where $T_{-1}(\theta )$ is the Kullback–Leibler divergence and $T_{0}(\theta )$ is the reversed Kullback–Leibler divergence (cf. Pardo and Pardo 2000).

If we set $\phi (x)=\phi _\lambda (x)$, which is defined in Eq. (5), then note that $T_{\lambda }(\theta )=S_{\phi _\lambda }(\theta )$, i.e., the family of power divergences is a subclass of the family of $\phi $-divergences. The divergence $T_{\lambda }(\theta )=S_{\phi _\lambda }(\theta )$ is estimated by $S_{\phi _\lambda ,n}(\theta )$, and Theorem 1 establishes the existence of a consistent root of the equation $({\mathrm{d}}/{\mathrm{d}}\theta )S_{\phi _\lambda ,n}(\theta )=0$. The next result asserts that any such sequence is asymptotically normal and efficient when $k,m\rightarrow \infty $ as $n\rightarrow \infty $.

Corollary 1

Let $m\rightarrow \infty $ such that $m=o(n)$. Suppose that Assumptions 1 and 2 and conditions (iii) and (iv) of Theorem 2 hold true and that the function $\phi _\lambda $ is defined by (5) for some $\lambda \in (-\infty ,\infty )$, then for any consistent root ${\hat{\theta }}_{\lambda ,n} $ of $({\mathrm{d}}/{\mathrm{d}}\theta )S_{\phi _\lambda ,n}(\theta )=0$, we have

$$\begin{aligned} \sqrt{n} ({\hat{\theta }}_{\lambda ,n} -\theta _{0} ){\mathop {\longrightarrow }\limits ^{d}}N\left( 0,\frac{1}{I(\theta _{0} )} \right) \text{ as } n\rightarrow \infty . \end{aligned}$$

Such a sequence ${\hat{\theta }}_{\lambda ,n} $ of roots is typically provided by $\arg \min _{\theta \in {\varTheta }}S_{\phi _\lambda ,n}(\theta )$, and in this case, the estimator may be referred to as a Minimum Power Divergence Estimator (MPDE). Another family of divergence measures was provided by Rényi (1961), and extended in Liese and Vajda (1987). These are given by

$$\begin{aligned} R_{\alpha }(\theta )= & {} \frac{1}{\alpha (\alpha -1)}\log \int _{-\infty }^\infty f_\theta (x)^\alpha f_{\theta _0}(x)^{1-\alpha }\mathrm{d}x , \end{aligned}$$

where the cases $\alpha =0$ and $\alpha =1$ are given by continuity, i.e., $R_{0}(\theta )=\lim _{\alpha \rightarrow 0}R_{\alpha }(\theta )=T_{-1}(\theta )$ and $R_{1}(\theta )=\lim _{\alpha \rightarrow 1}R_{\alpha }(\theta )=T_{0}(\theta ).$ Thus, $R_{0}(\theta )$ and $R_{1}(\theta )$ are Kullback–Leibler divergences and belong to the family of power divergences. When $\alpha \ne 0,1$, $R_{\alpha }(\theta )$ may be estimated by $\alpha ^{-1}(\alpha -1)^{-1}\log S_{\phi _\alpha ,n}(\theta )$, where $\phi _\alpha (x)=x^\alpha $, and if $\lambda =\alpha -1$ then note that $\arg \min _{\theta \in {\varTheta }}\alpha ^{-1}(\alpha -1)^{-1}\log S_{\phi _\alpha ,n}(\theta )=\arg \min _{\theta \in {\varTheta }}S_{\phi _\lambda ,n}(\theta )$. Thus, each MPDE may also be regarded as a minimum Rényi divergence estimator.

The Hellinger distance between $F_\theta $ and $F_{\theta _0}$ is given by

$$\begin{aligned} H(\theta )= & {} \left( \frac{1}{2}\int _{-\infty }^\infty \left( f_\theta (x)^{1/2}-f_{\theta _0}(x)^{1/2}\right) ^2\mathrm{d}x\right) ^{1/2} , \end{aligned}$$

and may be estimated by $(1+S_{\phi ,n}(\theta ))^{1/2}$, where $\phi (x)=-x^{1/2}$. In this case, we have, for $\lambda =-\,1/2$, $\arg \min _{\theta \in {\varTheta }}(1+S_{\phi ,n}(\theta ))^{1/2}=\arg \min _{\theta \in {\varTheta }}S_{\phi _\lambda ,n}(\theta )$. Thus, the MPDE with $\lambda =-\,1/2$ may also be referred to as a minimum Hellinger distance estimator.

It is desired to have statistical estimation procedures that perform well when an assumed parametric model is correctly specified, thereby attaining high efficiency at the assumed model. A problem is that assumed parametric assumptions are almost never literally true. Thus, in addition, it is desired to have estimation procedures that are relatively insensitive to small departures from the model assumptions and that somewhat larger deviations from the model assumptions do not cause a “catastrophe” (Huber and Ronchetti 2009). Procedures satisfying these features are called robust. Due to its relationship with the estimator suggested by Beran (1977), we conjecture that our minimum Hellinger distance estimator, i.e., the MPDE with $\lambda =-\,1/2$, is robust (in the sense of Beran 1977 and Lindsay 1994). In addition, by arguments put forward in Lindsay (1994), we conjecture that GSEs based on bounded $\phi $ functions are robust with respect to contaminations of the original data (cf. Mayoral et al. 2003). In the next section, we consider the MPDE with $\lambda =-\,1/2$ and apply Monte Carlo simulations to compare its performance with those of the MLE and the MPDEs with $\lambda = -\,1$ and $-\,0.9$.

3 A simulation study

In this section, we explore the finite sample properties of the Minimum Power Divergence Estimators (MPDEs) ${\hat{\theta }}_{\lambda ,n}=\arg \min _{\theta \in {\varTheta }}S_{\phi _\lambda ,n}(\theta )$, where $\phi _\lambda $ is defined in Eq. (5).

First, we consider estimating the mean $\theta $ of a $N(\theta , 1)$ distribution and compare the root mean square errors (RMSEs) of various MPDEs, with the RMSE of the MLE. These are shown in Fig. 1. MPDEs are computed for $\lambda =-\,1,-\,0.9$, and $-\,0.5$, and for all values of m, the order of the spacings, which are divisors of n. We define a relative RMSE of an MPDE to be its RMSE divided by the RMSE of the MLE. Each RMSE was estimated from 1000 Monte Carlo samples with $n=840$ from the $N(\theta ,1)$ distribution with $\theta =0$. We present relative RMSEs for m up to 150, because when m is larger than that the relative RMSEs tend to be quite large in comparison. From Fig. 1, we see that MPDEs with $\lambda $ equal to $-\,1$ or $-\,0.9$ are about as good as the MLE for comparatively small values of m (and less well for larger m). The MPDE with $\lambda =-\,0.5$ is not quite as good in terms of RMSE. For example, with an optimally chosen m, it had an RMSE about 0.5% larger than that of the MLE. The simulation results indicate that the optimal choice of m is 15, 15, and 14 for $\lambda =-\,1,-\,0.9$, and $-\,0.5$, respectively. (Although for $\lambda =-\,1$ and $-\,0.9$ there are about ten other candidates, respectively, that perform almost as good.)

Next, we consider the issue of stability/robustness of these estimators. It is known that the MSP estimator (Cheng and Amin 1983; Ranneby 1984), i.e., the MPDE with $\lambda =-\,1$ and $m=1$, much like the MLE, suffers from lack of stability under even small deviations from the underlying model, i.e., the distributions of the MSP and ML estimators can be greatly perturbed if the assigned model is only approximately true. This is demonstrated in a simulation study by Nordahl (1992) and by Fujisawa and Eguchi (2008) in a numerical study on the MLE. As in Nordahl (1992), we will assume a proportion $1-\varepsilon $ of the data is generated from a normal distribution, while a proportion $\varepsilon $ is generated by some unknown mechanism that produces “outliers.” For example, measurements are made, which are 95% of the time correct, while 5% of the time operator reading/writing errors are made or the recording instrument malfunctions. Therefore, we assume that a random sample $\xi _1,\ldots ,\xi _{n-1}$ is generated from an $\varepsilon $-contaminated normal distribution $G( x- \theta _0)$, where

$$\begin{aligned} G(x)=(1-\varepsilon ){\varPhi }(x)+\varepsilon H( x), \end{aligned}$$

in which H(x) denotes an arbitrary distribution that models the outliers and $\varepsilon $ is the level of contamination. Of interest is to estimate $\theta _0$, the mean of the observations in the case when no recording errors occur. If $n\varepsilon $ is rather small, we may have few observations from H, making it difficult to assess the model for H. In such a case, instead of modeling the mixture distribution G, one may (wrongly) assume that all observations come from ${\varPhi }(x-\theta )$ with $\theta =\theta _0$, and then use an estimation method that provides a good estimate of $\theta _0$ even in presence of outliers coming from H. That is, robust estimation aims at finding an estimator ${{\hat{\theta }}}$ that efficiently estimates $\theta _0$ even though the data are contaminated by an outlier distribution H (Fujisawa and Eguchi 2008).

In our Monte Carlo simulation, we used $H(x) = {\varPhi }((x - \rho )/\tau )$, $\tau > 0$, with $\rho =10$ and $\tau =1$. For each $\varepsilon =0.0,0.1,\ldots ,0.4$, we generated 1000 Monte Carlo samples with $n=840$ from $G(x- \theta _0)$ with $\theta _0=0$, and for every sample, the MLE of $\theta _0$ was computed using the model $F_\theta (x) = {\varPhi }(x-\theta )$. MPDEs for this case were also computed for $\lambda =-\,1,-\,0.9$, and $-\,0.5$ and for the previously found optimal values of m for the respective values of $\lambda $. For each level of contamination, we used the 1000 samples for computing estimated RMSEs for the respective estimators of $\theta _0$. The resulting RMSEs are shown in Fig. 2. In case of contamination, we see that the MPDEs with $\lambda $ equal to $-\,0.9$ or $-\,0.5$ are superior to the MLE and the MPDE with $\lambda =-\,1$. In other words, the MLE and MPDEs such as the MSP estimator, which all can be derived from the Kullback–Leibler divergence, perform poorly under contamination, and other MPDEs are to be preferred.

Looking at Fig. 1, it is clear that the choice of m, the order of spacings, is important for the quality of MPDE estimators. We now propose a data-based approach for choosing m for MPDEs (and more generally for GSEs). No asymptotic optimality is claimed for the approach. The main purpose is rather to provide sensible answers for finite sample sizes. For a given $\lambda $ (or $\phi $-function), let ${{\hat{\theta }}}_m$ denote the MPDE (or GSE) using the order of spacings m. The suggested approach is given by the following algorithm:

Step 1 Compute ${{\hat{\theta }}}_{1}$.
Step 2 For r in $1,\ldots ,R$: Draw a bootstrap sample $x_{r,1}^\star ,\ldots ,x_{r,n-1}^\star $ from $F_{{{\hat{\theta }}}_{1}}$. For some set of positive integers, ${{\mathcal {M}}}$, compute ${{\hat{\theta }}}_{r,m}^\star $ for each $m\in {{\mathcal {M}}}$, where ${{\hat{\theta }}}_{r,m}^\star $ denotes the rth bootstrap replicate of ${{\hat{\theta }}}_m$.
Step 3 Choose $m_{\mathrm{opt}}={{{\,\mathrm{arg\,min}\,}}}_{m\in {{\mathcal {M}}}}\frac{1}{R}\sum _{r=1}^R\left( {{\hat{\theta }}}_{r,m}^\star -{{\hat{\theta }}}_{1}\right) ^2$.

Under the same settings as in Fig. 2 and with m chosen according to the above algorithm, with ${{\mathcal {M}}}$ defined as the set of divisors of n, we consider two MPDEs, with $\lambda =-\,0.9$ and $-\,0.5$, respectively. We compare these with Karunamuni and Wu’s (2011) one-step minimum Hellinger distance estimator (MHDE), obtained from a one-step Newton–Raphson approximation to the solution of the equation ${{\hat{S}}}_\phi '(\theta )=0$, where ${{\hat{S}}}_\phi (\theta )$ is defined as in (2), with $\phi (x)= \frac{1}{2}|1-\sqrt{x}|^2$ and ${{\hat{f}}}$ a kernel density density estimator. Karunamuni and Wu (ibid.) show that their one-step MHDE has the same asymptotic behavior as Beran’s (1977) MHDE, as long as the initial estimator in the Newton–Raphson algorithm is reasonably good and that it retains excellent robustness properties of the MHDEs. In our simulations, we used the median as the initial estimator of $\theta _0$, and the kernel estimator was based on the Epanechnikov kernel with bandwidth chosen to be $(15e)^{1/5}(\pi /32)^{1/10}{{\hat{\sigma }}} n^{-1/5}$, where ${\hat{\sigma }}=\text{ median }\{|\xi _i-\text{ median }\{\xi _j\}|\}/{\varPhi }(3/4)$ (cf. Basu et al. 2011, pp. 108–109). The resulting RMSEs are shown in Fig. 3. When $\varepsilon =0$, the MPDE with $\lambda =-\,0.9$ is the winner in terms of RMSE (In comparison, the MPDE estimators with $\lambda =-\,0.9$ and $-\,0.5$, and the one-step MHDE had RMSEs that were about 0.0, 1.8, and 0.4 percent larger than that of the MLE, respectively). For $\varepsilon =0.1$, the one-step MHDE performs somewhat better than the two MPDEs, but for $\varepsilon =0.2, 0.3,$ and 0.4, the most efficient estimator is the MPDE with $\lambda =-\,0.5$. Under heavy contamination, i.e., for levels of contamination equal to or larger than 0.3, both MPDEs are clearly more robust than the one-step MHDE.

By Corollary 1, MPDEs are asymptotically normal and efficient under a general set of regularity conditions. When applying an MPDE, a particular value of $\lambda $ needs to be chosen. In our simulations, we considered three choices, $\lambda =-\,1,-\,0.9$, and $-\,0.5$. Much like the MLE, the MPDE with $\lambda =-\,1$ can be greatly perturbed if the assigned model is only approximately true. In the choice between $\lambda = -\,0.9$ and $\lambda = -\,0.5$, the former appears to provide better estimates if there is no contamination, while the latter seems to give more robust estimators when some contamination is suspected.

4 Concluding remarks

In this paper, we propose classes of estimators, called Generalized Spacings Estimators or GSEs, based on non-overlapping higher-order spacings and show that under some regularity conditions, they are consistent and asymptotically normal. Within these classes, we demonstrate the existence of asymptotically efficient estimators, called MPDEs. Through simulations, we demonstrate that they perform well also in moderate sample sizes relative to the MLEs. However, unlike the MLEs, some of these spacings estimators are quite robust under contamination. In this article, we also propose a data-driven choice for the order of spacings, m, based on bootstrapping, and the Monte Carlo studies indicate that this practical way of choosing m leads to MPDEs which perform comparatively well and even much better at higher levels of contamination, than the one-step MHDEs proposed in the literature. Moreover, the GSEs suggested here can be suitably extended and used in more general situations. For example, by using mth nearest neighbor balls as a multidimensional analogue to univariate mth-order spacings, our proposed classes of estimators can be extended to multivariate observations (Kuljus and Ranneby (2015) studied this problem for $m=1$), but specifics need further exploration. Another possibility is to define GSEs based on overlapping spacings of order m. The derivation of the asymptotic distribution of such estimators is an open problem.

References

Basu A, Shioya H, Park C (2011) Statistical inference: the minimum distance approach. Chapman & Hall, Boca Raton
Book Google Scholar
Beran R (1977) Minimum Hellinger distance estimates for parametric models. Ann Stat 5:445–463
Article MathSciNet Google Scholar
Cheng RCH, Amin NAK (1983) Estimating parameters in continuous univariate distributions with a shifted origin. J R Stat Soc B 45:394–403
MathSciNet MATH Google Scholar
Csiszár I (1963) Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Beweiz der Ergodizität von Markoffschen Ketten. Magyar Tud Akad Mat Kutató Int Közl 8:85–108
MathSciNet MATH Google Scholar
Csiszár I (1977) Information measures: a critical survey. In: Transactions of the 7th Prague conference on information theory, statistical decision functions, random processes, pp 73–86
Ekström M (1996) Strong consistency of the maximum spacing estimate. Theory Probab Math Stat 55:55–72
MathSciNet MATH Google Scholar
Ekström M (1997) Maximum spacing methods and limit theorems for statistics based on spacings. Dissertation, Umeå University
Ekström M (1998) On the consistency of the maximum spacing method. J Stat Plan Inference 70:209–224
Article MathSciNet Google Scholar
Ekström M (2001) Consistency of generalized maximum spacing estimates. Scand J Stat 28:343–354
Article MathSciNet Google Scholar
Ekström M (2008) Alternatives to maximum likelihood estimation based on spacings and the Kullback–Leibler divergence. J Stat Plan Inference 138:1778–1791
Article MathSciNet Google Scholar
Fujisawa H, Eguchi S (2008) Robust parameter estimation with a small bias against heavy contamination. J Multivar Anal 99:2053–2081
Article MathSciNet Google Scholar
Ghosh K (1997) Some contributions to inference using spacings. Dissertation, University of California, Santa Barbara
Ghosh K, Jammalamadaka S Rao (2001) A general estimation method using spacings. J Stat Plan Inference 93:71–82
Article MathSciNet Google Scholar
Holst L, Rao JS (1980) Asymptotic theory for some families of two-sample nonparametric statistics. Sankhyā Ser A 42:1–28
MathSciNet MATH Google Scholar
Huber PJ, Ronchetti EM (2009) Robust statistics, 2nd edn. Wiley, Hoboken
Book Google Scholar
Kapur JN, Kesavan HK (1992) Entropy optimization principles with applications. Academic Press, New York
Book Google Scholar
Karunamuni RJ, Wu J (2011) Minimum Hellinger distance estimation in a nonparametric mixture model. J Stat Plan Inference 139:1118–1133
Article MathSciNet Google Scholar
Kuljus K, Ranneby B (2015) Generalized maximum spacing estimation for multivariate observations. Scand J Stat 42:1092–1108
Article MathSciNet Google Scholar
Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, New York
MATH Google Scholar
Liese F, Vajda I (1987) Convex statistical distances. Teubner, Leipzig
MATH Google Scholar
Lindsay BG (1994) Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann Stat 22:1081–1114
Article MathSciNet Google Scholar
Mayoral AM, Morales D, Morales J, Vajda I (2003) On efficiency of estimation and testing with data quantized to fixed number of cells. Metrika 57:1–27
Article MathSciNet Google Scholar
Menéndez ML, Morales D, Pardo L (1997) Maximum entropy principle and statistical inference on condensed ordered data. Stat Probab Lett 34:85–93
Article MathSciNet Google Scholar
Menéndez ML, Morales D, Pardo L, Vajda I (2001a) Minimum divergence estimators based on grouped data. Ann Inst Stat Math 53:277–288
Article MathSciNet Google Scholar
Menéndez ML, Pardo L, Vajda I (2001b) Minimum disparity estimators for discrete and continuous models. Appl Math 46:439–466
Article MathSciNet Google Scholar
Mirakhmedov SA (2005) Lower estimation of the remainder term in the CLT for a sum of the functions of k-spacings. Stat Probab Lett 73:411–424
Article MathSciNet Google Scholar
Mirakhmedov SM, Rao Jammalamadaka S (2013) Higher-order expansions and efficiencies of tests based on spacings. J Nonparametr Stat 25:339–359
Article MathSciNet Google Scholar
Nordahl G (1992) A comparison of different maximum spacing estimators. Statistical research report 1992-1, University of Umeå
Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall-CRC, Boca Raton
MATH Google Scholar
Pardo MC, Pardo JA (2000) Use of Rènyi’s divergence to test for the equality of the coefficient of variation. J Comput Appl Math 116:93–104
Article MathSciNet Google Scholar
Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, Orlando
MATH Google Scholar
Ranneby B (1984) The maximum spacing method. An estimation method related to the maximum likelihood method. Scand J Stat 11:93–112
MathSciNet MATH Google Scholar
Read TRC, Cressie NAC (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New York
Book Google Scholar
Rényi A (1961) On measures of entropy and information. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability, vol 1, pp 547–561
Shao Y, Hahn MG (1994) Maximum spacing estimates: a generalization and improvement of maximum likelihood estimates I. Probab Banach Spaces 9:417–431
Article MathSciNet Google Scholar
Shao Y, Hahn MG (1999) Strong consistency of the maximum product of spacings estimates with applications in nonparametrics and in estimation of unimodal densities. Ann Inst Stat Math 51:31–49
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank three anonymous reviewers for their valuable constructive comments that lead to an improved version of the paper.

Author information

Authors and Affiliations

Department of Statistics, USBE, Umeå University, Umeå, Sweden
M. Ekström
Department of Forest Resource Management, Swedish University of Agricultural Sciences, Umeå, Sweden
M. Ekström
Department of Probability and Statistics, Institute of Mathematics, Academy of Sciences of Uzbekistan, Tashkent, Uzbekistan
S. M. Mirakhmedov
Department of Statistics and Applied Probability, University of California, Santa Barbara, USA
S. Rao Jammalamadaka

Authors

M. Ekström
View author publications
You can also search for this author in PubMed Google Scholar
S. M. Mirakhmedov
View author publications
You can also search for this author in PubMed Google Scholar
S. Rao Jammalamadaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Ekström.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 225 KB)

Appendix

To simplify the notation in the proofs, we will write $S(\theta )$, $S_n(\theta )$, and ${\hat{\theta }}_n$ rather than $S_\phi (\theta )$, $S_{\phi ,n}(\theta )$, and ${\hat{\theta }}_{\phi ,n}$, respectively, and when $\theta =\theta _0$, we use the simplified notations $S_n=S_n(\theta _0)$ and $D_{j,m}=D_{j,m}(\theta _0)$. It should be noted that $D_{j,m}$, $j=1,\ldots ,k$, are distributed as non-overlapping mth-order spacings from a uniform distribution.

Recall that $W_{1},W_{2},\ldots $ are independent standard exponentially distributed r.v.’s. Let $G_{j,m}=W_{(j-1)m+1} +\cdots +W_{jm}$ and ${\bar{G}}_{j,m} =m^{-1} G_{j,m}$, $j=1,\ldots ,k$. Note that $G_{1,m} ,\ldots ,G_{k,m}$ are i.i.d. gamma r.v.’s. To keep the notation simple, we denote $G_{m} =G_{1,m}$ and ${\bar{G}}_{m}=m^{-1}G_{m} $.

Lemma 1

(Holst and Rao 1980) Let $\varphi (u)$, defined on (0, 1), be continuous, except possibly for finitely many u, and bounded in absolute value by an integrable function. Then,

$$\begin{aligned} \frac{1}{n} \sum _{i=1}^{n}\varphi \left( \frac{i}{n+1} \right) \rightarrow \int _{0}^{1}\varphi (u)\mathrm{d}u, \text{ as } n\rightarrow \infty . \end{aligned}$$

Proposition 1

Assume that $m>c_2$ is fixed or that $m\rightarrow \infty $ such that $m=o(n)$. Then, under Assumptions 1 and 3, for any fixed $\theta \ne \theta _0$,

$$\begin{aligned} P(S_{\phi ,n}(\theta )>S_{\phi ,n}(\theta _0))\rightarrow 1 \text{ as } n\rightarrow \infty . \end{aligned}$$

Proof

See online Supplementary Material. $\square $

Proof of Theorem 1

The proof is very similar to that of Theorem 3.7 in Lehmann and Casella (1998, p. 447), and therefore, we omit the details. Note, however, that in Lehmann and Casella (1998), the log-likelihood needs to be replaced with $S_n(\theta )$, and the reference to Theorem 3.2 needs to be replaced with Proposition 1 of the current paper. $\square $

Proof of Theorem 2

To keep notation simple, we write $D_{j}(\theta )$ instead of $D_{j,1}(\theta )$. Note that

$$\begin{aligned} D_{j,m} (\theta )=D_{(j-1)m+1} (\theta )+D_{(j-1)m+2} (\theta )+\cdots +D_{jm} (\theta ). \end{aligned}$$

(7)

Under condition (i), we have

$$\begin{aligned} 0=S'_{n} ({\hat{\theta }}_{n} )=S'_{n} (\theta _{0} )+({\hat{\theta }}_{n} -\theta _{0} )S''_{n} (\theta _{0} )+2^{-1} ({\hat{\theta }}_{n} -\theta _{0} )^{2} S'''_{n} ({\tilde{\theta }}_{0} ) , \end{aligned}$$

where ${\tilde{\theta }}_{0} $ lies between ${\hat{\theta }}_{n} $ and $\theta _{0} $. Hence,

$$\begin{aligned} \sqrt{n} ({\hat{\theta }}_{n} -\theta _{0} )=\frac{\sqrt{n} S'_{n} (\theta _{0} )}{-S''_{n} (\theta _{0} )-2^{-1} ({\hat{\theta }}_{n} -\theta _{0} )S'''_{n} ({\tilde{\theta }}_{0} )} \end{aligned}$$

(8)

Set $g_{\theta } (u)=f_{01} (F_{\theta _{0} }^{-1} (u),\theta )/f\left( F_{\theta _{0} }^{-1} (u),\theta \right) $, and let $\tilde{U}_{j}$ denote a value in the interval $I_{j} =(F_{\theta _{0} } (X_{(j-1)} ),F_{\theta _{0} } (X_{(j)} ))$. We have,

$$\begin{aligned} \sqrt{n} S'_{n} (\theta _{0} )= & {} \frac{\sqrt{m} }{\sqrt{k} } \sum _{j=1}^{k}\sum _{l=1}^{m}kD_{(j-1)m+l} \phi '(kD_{j,m} )\frac{D'_{(j-1)m+l} (\theta _{0} )}{D_{(j-1)m+l} } \\= & {} \frac{\sqrt{m} }{\sqrt{k} } \sum _{j=1}^{k}\sum _{l=1}^{m}kD_{(j-1)m+l} \phi '(kD_{j,m} )g_{\theta _{0} } ({\tilde{U}}_{(j-1)m+l} ) , \end{aligned}$$

where the last equality follows by the mean value theorem. Set $u_{j} =(j-1/2)/n$. From the existence of the limiting distribution of the Kolmogorov–Smirnov statistic, we have

$$\begin{aligned} |{\tilde{U}}_{j} -u_{j}|=O_{p} (n^{-1/2} ) \text{ as } n\rightarrow \infty , \text{ uniformly } \text{ in } j. \end{aligned}$$

(9)

Keeping this in the mind and that $k=n/m$, we write

$$\begin{aligned} \sqrt{n} S'_{n} (\theta _{0} )= & {} \sqrt{m} \left\{ \frac{1}{m\sqrt{k} }\right. \sum _{j=1}^{k} \sum _{l=1}^{m}\left[ nD_{(j-1)m+l} \phi '(k D_{j,m} )-\mu _{j,l,m} \right] g_{\theta _{0} } (u_{(j-1)m+l} ) \nonumber \\&+\,\frac{1}{m\sqrt{k} } \sum _{j=1}^{k} \sum _{l=1}^{m}\mu _{j,l,m} g_{\theta _{0} } ({\tilde{U}}_{(j-1)m+l} )\nonumber \\&+\,\frac{1}{m\sqrt{k} } \sum _{j=1}^{k} \sum _{l=1}^{m}\left[ nD_{(j-1)m+l} \phi '(kD_{j,m} )-\mu _{j,l,m} \right] \nonumber \\&\left. \times \,\left[ g_{\theta _{0} } ({\tilde{U}}_{(j-1)m+l} )-g_{\theta _{0} } (u_{(j-1)m+l} )\right] _{{}_{}^{}}^{{{}_{}^{}}}\right\} \nonumber \\= & {} \sqrt{m} \left\{ A_{k} +B_{k} +C_{k} \right\} , \end{aligned}$$

(10)

where $\mu _{j,l,m}=E\left( W_{(j-1)m+l}\phi '({\bar{G}}_{j,m} )\right) =\mu _{m} $. The summands $A_{k} $ and $C_{k} $ are sums of functions of m-step non-overlapping uniform spacings. Such statistics are well studied in the literature; the results needed are given by Mirakhmedov (2005) and Mirakhmedov and Rao (2013). We will use these facts below. Note,

$$\begin{aligned} \int _{0}^{1}g_{\theta _{0} } (u)\mathrm{d}u =0\,\,\,\, \text{ and } \,\,\int _{0}^{1}g_{\theta _{0} }^{2} (u)\mathrm{d}u =I(\theta _{0} ), \end{aligned}$$

(11)

the Fisher information in a single observation. Taking this into account and by using Corollary 1 and Remark 1 of Mirakhmedov (2005) and Lemma 1, we obtain that the limiting distribution of $A_k$ is ${ {N}}(0,{\tilde{\sigma }}_{A}^{2})$, which we denote as

$$\begin{aligned} A_{k} \mathop {\sim }\limits ^{d} { {N}}(0,{\tilde{\sigma }}_{A}^{2}) , \end{aligned}$$

(12)

where

$$\begin{aligned} {\tilde{\sigma }}_{A}^{2}= & {} \frac{1}{k} \sum _{j=1}^{k}\mathrm{Var}\left( \frac{1}{m} \sum _{l=1}^{m}W_{l} \phi '({\bar{G}}_{j,m} )g_{\theta _{0} } (u_{(j-1)m+l} )\right) \nonumber \\&-\,\frac{1}{km} \left\{ \frac{1}{\sqrt{k} } \sum _{j=1}^{k}\mathrm{Cov}\left( \frac{1}{m} \sum _{l=1}^{m}W_{l} \phi '({\bar{G}}_{j,m} )g_{\theta _{0} } (u_{(j-1)m+l} ) ,G_{j,m} \right) \right\} ^{2}\nonumber \\= & {} {\tilde{\sigma }}_{1A}^{2} -{\tilde{\sigma }}_{2A}^{2} . \end{aligned}$$

(13)

We have

$$\begin{aligned} {\tilde{\sigma }}_{1A}^{2}= & {} \frac{1}{k} \sum _{j=1}^{k}E\left( \frac{1}{m} \sum _{l=1}^{m}W_{(j-1)m+l} \phi '({\bar{G}}_{j,m} )g_{\theta _{0} } (u_{(j-1)m+l} ) \right) ^{2} \nonumber \\&-\,\frac{1}{k} \sum _{j=1}^{k}\left\{ E\left( \frac{1}{m} \sum _{l=1}^{m}W_{(j-1)m+l} \phi '({\bar{G}}_{j,m} )g_{\theta _{0} } (u_{(j-1)m+l} )\right) \right\} ^{2}\nonumber \\= & {} {\tilde{\sigma }}_{1,1A}^{2} -{\tilde{\sigma }}_{2,1A}^{2} . \end{aligned}$$

(14)

Because $m=o(n)$ and $|u_{(j-1)m+l}-j/(k+1)|=o(1)$ as $n\rightarrow \infty $, uniformly in j and l, the continuity of $g_{\theta _{0} }$, the equalities (11) and Lemma 1 imply that both $k^{-1} \sum _{j=1}^{k}g_{\theta _{0} }^{2} (u_{(j-1)m+l} ) $ and $k^{-1} \sum _{j=1}^{k}g_{\theta _{0} } (u_{(j-1)m+s} )g_{\theta _{0} } (u_{(j-1)m+l} ) $ converge to the same limit $I(\theta _{0} )$ for all $s,l=1,\ldots ,m$. Therefore, we obtain

$$\begin{aligned} {\tilde{\sigma }}_{1,1A}^{2}= & {} \frac{1}{k} \sum _{j=1}^{k}\frac{1}{m^{2} } \sum _{l=1}^{m}E\left( W_{(j-1)m+l} \phi '({\bar{G}}_{j,m} )\right) ^{2} g_{\theta _{0} }^{2} (u_{(j-1)m+l} ) \nonumber \\&+\,\frac{1}{k} \sum _{j=1}^{k}\frac{1}{m^{2} } \mathop {\sum _{s=1}^k\sum _{l=1}^k}_{s\ne l} g_{\theta _{0} } (u_{(j-1)m+s} ) g_{\theta _{0} } (u_{(j-1)m+l} )\nonumber \\&\times \, E\left( W_{(j-1)m+s} W_{(j-1)m+l} \phi '^{2} ({\bar{G}}_{j,m} )\right) \nonumber \\= & {} \frac{1}{m^{2} } \sum _{l=1}^{m}\bigg (E\left( W_{l} \phi '({\bar{G}}_{m} )\right) ^{2} \frac{1}{k} \sum _{j=1}^{k}g_{\theta _{0} }^{2} (u_{(j-1)m+l} ) \bigg ) \nonumber \\&+\,\frac{1}{m^{2} } \mathop {\sum _{s=1}^m\sum _{l=1}^m}_{s\ne l}E\left( W_{k} W_{l} \phi '^{2} ({\bar{G}}_{m} )\right) \frac{1}{k} \sum _{j=1}^{k}g_{\theta _{0} } (u_{(j-1)m+k} )g_{\theta _{0} } (u_{(j-1)m+l} ) \nonumber \\= & {} I(\theta _{0} )\frac{1}{m^{2} }\left\{ \sum _{s=1}^{m}E\left( W_{s} \phi '({\bar{G}}_{m} )\right) ^{2} +\mathop {\sum _{s=1}^m\sum _{l=1}^m}_{s\ne l}E\left( W_{s} W_{l} \phi '^{2} ({\bar{G}}_{m} )\right) \right\} \left( 1+o(1)\right) \nonumber \\= & {} I(\theta _{0} )\frac{1}{m^{2} } E\left\{ \phi '^{2} ({\bar{G}}_{m} )\bigg (\sum _{s=1}^{m}W_{s}^{2} +\mathop {\sum _{s=1}^m\sum _{l=1}^m}_{s\ne l}W_{s} W_{l} \bigg )\right\} \left( 1+o(1)\right) \nonumber \\= & {} I(\theta _{0} )E\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) ^{2} \left( 1+o(1)\right) . \end{aligned}$$

(15)

Similar arguments give

$$\begin{aligned} \sigma _{2,1,A}^{2}= & {} I(\theta _{0} )\bigg (\frac{1}{m} \sum _{k=1}^{m}E(W_{k} \phi '({\bar{G}}_{m} )) \bigg )^{2}\left( 1+o(1)\right) \nonumber \\= & {} I(\theta _{0} )\left( E({\bar{G}}_{m} \phi '({\bar{G}}_{m} ))\right) ^{2} \left( 1+o(1)\right) . \end{aligned}$$

(16)

By substituting (15) and (16) into (14), we get

$$\begin{aligned} \sigma _{1A}^{2} =I(\theta _{0} )\mathrm{Var}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) \left( 1+o(1)\right) . \end{aligned}$$

(17)

Next, we have

$$\begin{aligned} \sigma _{2A}^{2}= & {} \frac{1}{m} \left\{ \frac{1}{k} \sum _{j=1}^{k}\mathrm{Cov}\left( \frac{1}{m} \sum _{l=1}^{m}W_{l} \phi '({\bar{G}}_{m} )g_{\theta _{0} } (u_{(j-1)m+k} ) ,G_{j,m} \right) \right\} ^{2} \\= & {} \frac{1}{m} \bigg (\frac{1}{m} \sum _{l=1}^{m}\mathrm{Cov}\left( W_{l} \phi '({\bar{G}}_{m} ),G_{m} \right) \frac{1}{k} \sum _{j=1}^{k}g_{\theta _{0} } (u_{(j-1)m+l} ) \bigg )^{2} =o(1), \end{aligned}$$

by Lemma 1 and the first equality in (11). This together with (17) and (13) conclude that ${\tilde{\sigma }}_{A}^{2} =I(\theta _{0} )\mathrm{Var}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) $. Thus, by (12),

$$\begin{aligned} A_{k} \mathop {\sim }\limits ^{d} { {N}}\left( 0,\sigma _{A}^{2} I(\theta _{0} )\right) , \end{aligned}$$

(18)

where $\sigma _{A}^{2} =\mathrm{Var}\left( {\bar{G}}_{m} \phi '({\bar{G}}_{m} )\right) $. Let $U_{(j)}=F_{\theta _0}(\xi _{(j)})$ be the order statistics of $U_j=F_{\theta _0}(\xi _{j})$. As in (9), $|U_{(j)}-u_j|=O_p(n^{-1/2})$ as $n\rightarrow \infty $, uniformly in j, implying that $|{{\tilde{U}}}_{(j)}-U_{(j)}|=O_p(n^{-1/2})$ as $n\rightarrow \infty $, uniformly in j. This, together with the continuity of $g_{\theta _{0} }$ and the fact that $\mu _{j,l,m} =\mu _{m}$ for all j and l, imply that $B_k$ has the same asymptotic distribution as

$$\begin{aligned} \frac{1}{m\sqrt{k} } \sum _{j=1}^{k} \sum _{l=1}^{m}\mu _{j,l,m} g_{\theta _{0} } ({U}_{((j-1)m+l)} )=\frac{\mu _m}{m\sqrt{k} } \sum _{j=1}^{k} \sum _{l=1}^{m}g_{\theta _{0} } ({U}_{(j-1)m+l} ), \end{aligned}$$

which is a sum of independent r.v.’s. Hence, by (11) and the central limit theorem,

$$\begin{aligned} B_{k} \mathop {\sim }\limits ^{d} {{N}}\left( 0,m^{-1}I(\theta _{0} )\mu _{m}^{2} \right) . \end{aligned}$$

(19)

By the asymptotic normality of a sum of functions of uniform m-spacings (Corollary 1 in Mirakhmedov (2005)), the continuity of $g_{\theta _{0} }$, and (9), we obtain

$$\begin{aligned} C_{k} \mathop {\longrightarrow }\limits ^{p} 0. \end{aligned}$$

(20)

Next, consider $\mathrm{Cov}\left( A_{k} ,B_{k} \right) $. We shall use arguments like those on p. 39 in Ghosh (1997). That is, we use a two-term Taylor expansion for $g_{\theta _{0} } ({\tilde{U}}_{(j-1)m+s} )$ at $u_{(j-1)m+s} $, Theorem 2.2(1) of Mirakhmedov and Rao (2013), and Lemma 1. Also, note that the r.v.’s ${\tilde{U}}_{(j-1)m+s} $ and ${\tilde{U}}_{(i-1)m+s} $, $s=1,\ldots ,m$, with $j\ne i$, are asymptotically independent, because the intervals $I_{(j-1)m+s} $ and $I_{(i-1)m+s} $ are mutually exclusive. Then, by taking (9) and (11) into account and after some long and tedious algebra,

$$\begin{aligned}&\mathrm{Cov}(A_{k} ,B_{k})\nonumber \\&\quad =\,\frac{1}{m^{2} k} \sum _{j=1}^{k}\sum _{i=1}^{k}\sum _{l=1}^{m}\sum _{s=1}^{m}\mathrm{Cov}\left( nD_{(j-1)m+l} \phi '\left( \frac{n}{m}D_{j,m} \right) g_{\theta _{0} } (u_{(j-1)m+l} ),\right. \nonumber \\&\qquad \left. \mu _{j,s,m} g_{\theta _{0} } ({\tilde{U}}_{(i-1)m+s} )\right) \nonumber \\&\quad =\,\frac{\mu _{m} }{m^{2} k} \sum _{j=1}^{k}\sum _{l=1}^{m}\sum _{s=1}^{m}\Big \{\mathrm{Cov}\left( nD_{(j-1)m+l} \phi '\left( \frac{n}{m}D_{j,m} \right) g_{\theta _{0} } (u_{(j-1)m+l} ),{\tilde{U}}_{(j-1)m+s} \right) \nonumber \\&\qquad +\,o(1)\Big \} \nonumber \\&\quad =-I(\theta _{0} )\frac{\mu _{m}}{m^2} \sum _{l=1}^{m}\sum _{s=1}^{m}\mathrm{Cov}(W_{l} \phi '({\bar{G}}_{m} ),W_{s}) +o(1)\nonumber \\&\quad =-I(\theta _{0} )\mu _{m} \mathrm{Cov}({\bar{G}}_{m} \phi '({\bar{G}}_{m} ),{\bar{G}}_{m})+o(1)\nonumber \\&\quad =-I(\theta _{0} )\mu _{m} E\left( {\bar{G}}_{m} ^{2} \phi '({\bar{G}}_{m} )\right) +I(\theta _{0} )\mu _{m}^{2} +o(1), \end{aligned}$$

(21)

since $E{\bar{G}}_{m} =1$. Thus, from (10), (13), (18), (19), (20), and (21), we obtain

$$\begin{aligned} \sqrt{n} S'_{n} (\theta _{0} )\mathop {\sim }\limits ^{d} { {N}}(0,I(\theta _{0} )\sigma _{\phi }^{2}), \end{aligned}$$

(22)

where $\sigma _{\phi }^{2} $ is defined in (6). Let us consider the denominator of (8). Write

$$\begin{aligned} S''_{n} (\theta _{0} )= & {} \frac{1}{k} \sum _{j=1}^{k}\phi ''\left( kD_{j,m} \right) \left( \sum _{l=1}^{m}kD'_{(j-1)m+l} (\theta _{0} ) \right) ^{2} \nonumber \\&+\,\frac{1}{k} \sum _{j=1}^{k}\phi '\left( kD_{j,m} \right) \sum _{l=1}^{m}kD''_{(j-1)m+l} (\theta _{0} ) ={\varDelta } _{k} +\nabla _{k} . \end{aligned}$$

(23)

We have

$$\begin{aligned} {\varDelta } _{k}= & {} \frac{1}{k} \sum _{j=1}^{k}\phi ''(kD_{j,m} )\sum _{s=1}^{m}\left( kD_{(j-1)m+s} \right) ^{2} \left[ \frac{D'_{(j-1)m+s} (\theta _{0} )}{D_{(j-1)m+s} } \right] ^{2} +\frac{1}{k} \sum _{j=1}^{k}\phi ''(kD_{j,m} )\nonumber \\&\times \,\mathop {\sum _{s=1}^m\sum _{l=1}^m}_{s\ne l}kD_{(j-1)m+s} kD_{(j-1)m+l} \frac{D'_{(j-1)m+s} (\theta _{0} )}{D_{(j-1)m+s} } \frac{D'_{(j-1)m+l} (\theta _{0} )}{D_{(j-1)m+l} } \nonumber \\= & {} {\varDelta } _{1k} +{\varDelta } _{2k} . \end{aligned}$$

(24)

By the mean value theorem,

$$\begin{aligned} {\varDelta } _{1k}= & {} \frac{1}{k} \sum _{j=1}^{k}\phi ''(kD_{j,m} )\sum _{s=1}^{m}k^{2} D^{2} _{(j-1)m+s} g_{\theta _{0} }^{2} ({\tilde{U}}_{(j-1)m+s} ) \\= & {} \frac{1}{k} \sum _{j=1}^{k}\phi ''(kD_{j,m} )\sum _{s=1}^{m}k^{2} D^{2} _{(j-1)m+s} g_{\theta _{0} }^{2} (u_{(j-1)m+s} ) \\&+\,\frac{1}{k} \sum _{j=1}^{k}\phi ''(kD_{j,m} )\sum _{s=1}^{m}k^{2} D^{2} _{(j-1)m+s} \left( g_{\theta _{0} }^{2} ({\tilde{U}}_{(j-1)m+s} )-g_{\theta _{0} }^{2} (u_{(j-1)m+s} )\right) . \end{aligned}$$

The second term here tends to zero in probability due to continuity of the function $g_{\theta _{0} }$ and (9). For the first term, the central limit theorem (Corollary 1 and Remark 1 of Mirakhmedov (2005)) is valid with asymptotical expectation

$$\begin{aligned}&\frac{1}{k} \sum _{j=1}^{k}E\left( \phi ''({\bar{G}}_{m} )\frac{1}{m^{2} } \sum _{s=1}^{m}W_{s}^{2} g_{\theta _{0} }^{2} (u_{(j-1)m+s} ) \right) \\&\quad =\frac{1}{m^{2} } \sum _{k=1}^{m}E(W_{k}^{2} \phi ''({\bar{G}}_{m} )) \frac{1}{k} \sum _{j=1}^{k}g_{\theta _{0} }^{2} (u_{(j-1)m+k} )\\&\quad =I(\theta _{0} )\frac{1}{m^{2} } \sum _{s=1}^{m}E(W_{s}^{2} \phi ''({\bar{G}}_{m} )) \left( 1+o(1)\right) , \end{aligned}$$

because of Lemma 1 and (11). Hence,

$$\begin{aligned} {\varDelta } _{1k} \mathop {\sim }\limits ^{p} I(\theta _{0} )E\left( \phi ''({\bar{G}}_{m} )\frac{1}{m^{2} } \sum _{k=1}^{m}E(W_{k}^{2}) \right) . \end{aligned}$$

By using the same arguments, one can show that

$$\begin{aligned} {\varDelta } _{2k} \mathop {\sim }\limits ^{p} I(\theta _{0} )E\bigg (\phi ''({\bar{G}}_{m} )\frac{1}{m^{2} } \mathop {\sum _{s=1}^m\sum _{l=1}^m}_{s\ne l}W_{s} W_{l} \bigg ). \end{aligned}$$

Thus, putting last two relations into (24), we obtain

$$\begin{aligned} {\varDelta } _{k} \mathop {\sim }\limits ^{p} I(\theta _{0} )E\left( {\bar{G}}_{m}^{2} \phi ''({\bar{G}}_{m} )\right) . \end{aligned}$$

(25)

Consider $\nabla _{k} $. By noting that

$$\begin{aligned} D''_{(j-1)m+l} (\theta _0 )=D_{(j-1)m+l} \left\{ \left( \frac{D'_{(j-1)m+l} (\theta _0 )}{D_{(j-1)m+l} }\right) ' +\left( \frac{D'_{(j-1)m+l} (\theta _0 )}{D_{(j-1)m+l} }\right) ^{2} \right\} , \end{aligned}$$

and by using Lemma 1, we see that

$$\begin{aligned} \nabla _{k} \mathop {\sim }\limits ^{p} E{\bar{G}}_{m} \phi '\left( {\bar{G}}_{m} \right) \left( \int _{0}^{1}g'_{\theta _{0} } (u)\mathrm{d}u +\int _{0}^{1}g_{\theta _{0} }^{2} (u)\mathrm{d}u \right) . \end{aligned}$$

On the other hand, it is easy to see that $\int _{0}^{1}g'_{\theta _{0} } (u)\mathrm{d}u =-I(\theta _{0} )$, and therefore, by (11), $\nabla _{k} \mathop {\rightarrow }\limits ^{p} 0$. This fact, together with (23) and (25), implies that

$$\begin{aligned} S''_{n} (\theta _{0} )\mathop {\sim }\limits ^{p} I(\theta _{0} )E\left( {\bar{G}}_{m}^{2} \phi ''({\bar{G}}_{m} )\right) . \end{aligned}$$

(26)

Similar arguments show that

$$\begin{aligned} ({\hat{\theta }}_{n} -\theta _{0} )S'''_{n} ({\tilde{\theta }}_{0} )\mathop {\longrightarrow }\limits ^{p} 0. \end{aligned}$$

(27)

Theorem 2 follows from (8), (22), (26), and (27). $\square $

Proof of Corollary 1

By straightforward algebra, we find that

$$\begin{aligned} e_m(\phi _\lambda )=\frac{m}{\lambda ^2}\left( \frac{{\varGamma }(m){\varGamma }(m+2(1+\lambda ))}{{\varGamma }^2(m+1+\lambda )}-1-\frac{1+2\lambda }{m}\right) . \end{aligned}$$

By Stirling’s approximation formula, ${\varGamma }(x+1)=\sqrt{2\pi x}(x/e)^x(1+O(x^{-1}))$, and we find for large enough m that

$$\begin{aligned} e_m(\phi )=1+(1+\lambda )^2m^{-1}+c_\lambda m^{-2}, \end{aligned}$$

where $c_\lambda $ is a constant depending on $\lambda $ only. Hence, the corollary follows from Theorem 2. $\square $

Rights and permissions

OpenAccess This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Ekström, M., Mirakhmedov, S.M. & Jammalamadaka, S.R. A class of asymptotically efficient estimators based on sample spacings. TEST 29, 617–636 (2020). https://doi.org/10.1007/s11749-019-00637-7

Download citation

Received: 01 October 2017
Accepted: 30 January 2019
Published: 29 July 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11749-019-00637-7

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A class of asymptotically efficient estimators based on sample spacings

Abstract

Similar content being viewed by others

A class of estimators based on overlapping sample spacings

Some parametric tests based on sample spacings

Coverage probability and exact inference

1 Introduction

2 Main results

Assumption 1

Assumption 2

Assumption 3

Theorem 1

Theorem 2

Corollary 1

3 A simulation study

4 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 225 KB)

Appendix

Lemma 1

Proposition 1

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A class of asymptotically efficient estimators based on sample spacings

Abstract

Similar content being viewed by others

A class of estimators based on overlapping sample spacings

Some parametric tests based on sample spacings

Coverage probability and exact inference

1 Introduction

2 Main results

Assumption 1

Assumption 2

Assumption 3

Theorem 1

Theorem 2

Corollary 1

3 A simulation study

4 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 225 KB)

Appendix

Appendix

Lemma 1

Proposition 1

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation