Original articles
A test for the geometric distribution based on linear regression of order statistics

https://doi.org/10.1016/j.matcom.2020.08.023Get rights and content

Abstract

This paper proposes and studies a novel test for the geometric distribution which is based on a characterization of that law in terms of the conditional expectation of the second order statistic, given the value of the first order statistic. The asymptotic null distribution of the test statistic and its limit under general conditions are derived, proving that it is consistent against fixed alternatives. It can also detect alternatives converging to the null at the rate n12, n denoting the sample size. A weighted bootstrap and a parametric bootstrap can be used to consistently estimate the null distribution. The finite sample performance of these two bootstrap approximations is assessed via simulation. The power of the new test is numerically compared with that of some existing tests, concluding that the proposal presents a competitive behavior.

Introduction

The geometric distribution is a count model with applications in many research areas such as lifetime analysis, as the discrete counterpart of the exponential law, in capture–recapture methods, where it emerges as a Poisson mixing distribution (see, e.g. [2], [26]), among others. Let X be a random variable taking positive integer values, XN, with probability law P(X=j)=qj1p,j1,where q=1p, for some p(0,1), then we say that X has a geometric distribution with parameter p=1q and write XGeo(p).

Testing the goodness-of-fit (gof) of given observations with a probabilistic model is a crucial aspect of data analysis. Let XN. In this paper we consider the problem of testing H0:XGeo(p),for some p(0,1),against the general alternative H1:XGeo(p),p(0,1).Pearson’s χ2 test is commonly used for this testing problem. This test has the nice property that its test statistic is asymptotically distribution-free under the null hypothesis, provided that estimation of parameters is done properly. Its practical application presents two main problems: first, cell selection is not a clear-cut task; and the goodness of the χ2 approximation to the null distribution requires rather large sample sizes (see, e.g. [1] for this point in another testing framework). In addition, this test is not consistent against all alternatives. The smooth test numerically studied in [3] shares the same shortcoming. The general tests proposed in [14], [17], [22], [30] can be applied to testing H0 and all of them are consistent against fixed alternatives. A diagnostic tool, the ratio plot, has been investigated in [6] (see also [2], [5]) to graphically check H0. [12] proposed a gof test that can be applied to any discrete law having the power series distribution. In particular, it can be applied to testing H0. Nevertheless, its practical application presents some difficulties (specifically, the tabulation of all “arrangement” for each possible value of t, using the nomenclature in that paper).

This paper proposes and studies a novel gof test of the geometric distribution. The test is based on a characterization of that distribution introduced in [24], in terms of the conditional expectation of the second order statistic, given the value of the first order statistic, which is linear if and only if the law is geometric. Then, using the well-known Bierens [4] characterization of conditional moments, a test statistic is proposed in Section 2. Section 3 derives the almost sure limit of the test statistic, its asymptotic null distribution and its distribution under contiguous alternatives. It is concluded that the test that rejects for “large” values of the test statistic is able to detect any fixed alternative and detects alternatives converging to the null at the rate 1n. Since the asymptotic null distribution of the test statistic depends on unknown parameters, it cannot be used to approximate its null distribution. Section 4 studies two null distribution estimators, a weighted bootstrap and a parametric bootstrap, which are proven to yield consistent null distribution estimators. Section 5 summarizes the results of a simulation study, designed to assess the finite sample performance of the test when the null distribution is estimated by using the methods studied in Section 4, and to compare it with some competitors. The simulation results reveal that the new test has a very competitive performance, and hence it deserves to be included in any battery of gof tests for the geometric distribution. Finally, we applied the new test to a real data set. The proofs are deferred to Section 6. Section 7 displays the R code used to calculate the new test statistic. All limits in this paper are taken when n, where n denotes the sample size.

Section snippets

The test statistic

Let X1,,Xn be independent and identically distributed (iid) discrete random variables taking positive integer values, with P(X1=j)=pj>0, for j1. Let X1:nXn:n denote the order statistics. Theorem 3 of [24] shows that E(X2:n|X1:n=j)=j+a, j1, and certain a>0, if and only if the probability mass function of X1 satisfies (1.1) for q(0,1) being the solution of the equation na=1qn1q1qn1qn1.For n=2 and denoting M=min{X1,X2}=X1:2 and D=|X1X2|=X2:2X1:2, the above characterization can be

Asymptotic properties

We first calculate the limit of the proposed test statistic under general distributional assumptions.

Theorem 4

Let X1,,Xn be iid from X, a random variable taking values in N such that 1<E(X)<, then Tna.s.τX=|SX(t)|2w(t)dt,where SX(t)=E(DaX)eitM, aX=2E(X){E(X)1}2E(X)1.

Notice that τX0. Under the null hypothesis we have that τX=0. Moreover, since the weight function is positive, we have that τX=0 if and only if H0 is true. Therefore, as intuitively stated in Section 2, a reasonable test should

Approximating the null distribution

This section studies two estimators of the null distribution of Tn: a weighted bootstrap estimator and a parametric bootstrap estimator.

Numerical results

The results so far stated are asymptotic, that is, they are valid for large sample sizes. With the aim of studying the finite sample size performance of the proposed test, we carried out some simulation studies. Section 5.1 summarizes the outcomes of two experiments designed to compare the approaches in Section 4 to approximate the null distribution of Tn, that is, for the level. Section 5.2 reports the results of comparing the proposal in this paper with other existing tests of H0 in terms of

Proofs

Along this section, C is a generic positive constant taking many different values throughout the proofs.

Proof of Theorem 4

We have that Sn(t)=S1n(t)+S2n(t),with S1n(t)=1n(n1)1jkn(Djka)cos(tMjk)+sin(tMjk),S2n(t)=aaˆn(n1)1jkncos(tMjk)+sin(tMjk), and a is as defined in (2.1). From the strong law of large numbers for U-statistics (see, e.g. [31]), S1n(t)a.s.SX(t),tR.Since |cos(tMjk)|1 and |sin(tMjk)1, tR, and 0DjkXk+Xj, 1jkn, it follows that |S1n(t)|4X̄+2a, tR. From the SLLN, X̄a.s.E(X)<.

R code for the exact calculation of Tn

This section displays the function we wrote for the exact calculation of the test statistic Tn, with weight function w the probability density function of a normal law with mean 0 and variance β. The inputs are x = vector containing the data, and beta = variance of the weight function.

Acknowledgments

The authors thank two anonymous referees for their constructive comments and suggestions which helped to improve the presentation. The research in this paper has been partially funded by grants: CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness (M.V. Alba-Fernández) and MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness , ERDF support included (M.D. Jiménez-Gamero).

References (35)

  • BöhningD. et al.

    The geometric distribution, the ratio plot under the null and the burden of dengue fever in chiang mai province

  • ChenX. et al.

    Central limit and functional central limit theorems for Hilbert-valued dependent heterogeneous arrays with applications

    Econom. Theory

    (1998)
  • DelhingH. et al.

    Random quadratic forms and the bootstrap for U-statistics

    J. Multivariate Anal.

    (1994)
  • EscancianoJ.C.

    Goodness–of–fit tests for linear and nonlinear time series models

    J. Amer. Statist. Assoc.

    (2006)
  • GiacominiR. et al.

    A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators

    Econometric Theory

    (2013)
  • González-BarriosJ.M. et al.

    Goodness of fit for discrete random variables using the conditional density

    Metrika

    (2006)
  • HenzeN.

    Empirical-distribution-function goodness-of-fit tests for discrete models

    Can. J. Stat.

    (1996)
  • Cited by (6)

    • Investigating the predictability of crashes on different freeway segments using the real-time crash risk models

      2021, Accident Analysis and Prevention
      Citation Excerpt :

      Crash prediction trails are independent of each other, and the probability of success is the same for each trial, which means that crash prediction behaviors follow a geometric distribution. According to geometric distribution, the expectation is given by the inverse of the probability (Jiménez-Gamero and Alba-Fernández, 2021). Thus, the reciprocal of P(A|A') (ROP) can be explained as the actual amount of required forecasts prior to a crash.

    • Quantifying the ratio-plot for the geometric distribution

      2021, Journal of Statistical Computation and Simulation
    View full text