Elsevier

Econometrics and Statistics

Volume 26, April 2023, Pages 153-160
Econometrics and Statistics

When the score function is the identity function - A tale of characterizations of the normal distribution

https://doi.org/10.1016/j.ecosta.2020.10.001Get rights and content

Abstract

The normal distribution is well-known for several results that it is the only to fulfil. Much less well-known is the fact that many of these characterizations follow from the fact that the derivative of the log-density of the normal distribution is the (negative) identity function. This a priori very simple yet surprising observation allows a deeper understanding of existing characterizations and paves the way for an immediate extension of various seemingly normal-based characterizations to a general density by replacing the (negative) identity function in these results with the derivative of that log-density.

Introduction

The normal or Gaussian distribution is the most popular probability law in statistics and probability. The reasons for this popularity are manifold, including the nice bell curve shape, the simple form of the densityxϕμ,σ(x):=12πσexp((xμ)22σ2),xR,with easily interpretable location parameter μR and scale parameter σ>0, the ensuing mathematical tractability, the straightforward extension to the multivariate normal density (which we shall however not deal with in this paper) or the fact of being the limit distribution in the Central Limit Theorem. Besides these major appeals, the normal distribution is also famous for satisfying various characterizations, the latter being theoretical results that only one distribution (or one class of distributions) fulfils. Carl Friedrich Gauss himself has obtained the normal density by searching for a probability distribution where the maximum likelihood estimator of the location parameter always (see Section 2.1 for a precise meaning) coincides with the most intuitive estimator, namely the sample average. Numerous other characterizations of this popular distribution have followed, and in general it took the researchers decades to extend them to other distributions, often in an ad hoc way.

For the sake of historical correctness, it is necessary to recall that the “Gaussian distribution” is a perfect example of Stigler’s law of eponymy (Stigler, 1980) because it got first introduced by de Moivre in 1738. For more information, we refer the interested reader to the insightful paper Le Cam (1986).

In the present paper, we will show that an apparently inessential characterization of the normal distribution turns out to be a crucial building block in several more famous characterizations. This characterization is the fact that (logϕ0,1(x))=x or, equivalently, ddμ(logϕμ,σ(x))=xμσ2. In the former notation we speak of the derivative of the log-density, while the second case features the location score function (we will refer to both settings as the “identity function” or “location score function”). It is straightforward to see that the normal distribution is the only one for which these results hold. We shall show in the remainder of this paper that this particular characterization of the normal distribution via the identity function lies at the core of many characterizations that convey its special role to the normal distribution. We will illustrate this fact by means of 4 totally unrelated examples from the literature, namely the maximum likelihood characterization (Section 2.1), a singular Fisher information matrix characterization within skew-symmetric distributions (Section 2.2), Stein characterizations (Section 2.3) and a characterization related to variance bounds (Section 2.4). In each case, we indicate where the identity function plays its role and how, by replacing it with (logp(x)) for some general density p, the characterization that seemed tailor-made for the normal distribution can in fact be extended to other distributions. For some examples we bring in some innovative viewpoint in the proofs, for others not. We wish to stress that the examples are not the goal of this paper, but they serve the purpose of illustrating the global vision of our paper, namely that recognizing x as (logϕ0,1(x))

  • allows a better understanding of various characterization results of the normal distribution;

  • yields a simple tool for extending a result for the normal to virtually any distribution.

The reason why we wish to stress the (logϕ0,1(x))=x characterization is because it goes unnoticed. If one were to find (log(p(x))) in the examples shown in Sections 2.1-2.4, it would be obvious, and not surprising, that the result holds for the density p and can be generalized to another density q by replacing p with q. In case of x, the underlying density ϕ0,1(x) is hidden (of course, not every x is associated with the normal distribution).

Finally, bearing in mind that the issues we just discussed are based on the location score function, we explain in Section 3 how alternative characterizations can be obtained by rather looking at the scale score function. We conclude the paper with final comments in Section 4.

Section snippets

Maximum likelihood characterization

We call location MLE characterization the characterization of a probability distribution via the structure of the Maximum Likelihood Estimator (MLE) of the location parameter. Gauss (1809) showed that, in a location family p(xμ) with differentiable density p, the MLE for μ is the sample mean x¯=1ni=1nxi for all samples (x1,,xn) of all sample sizes n if, and only if, p is the normal density. This result has been successively refined in two directions. On the one hand, several authors have

Further extensions by means of the scale score function

So far our extensions of characterization theorems of the normal distribution to any other continuous distribution with density p have been based on the score function φp(x)=p(x)/p(x) as natural extension of x, the normal score function. It is important to keep in mind that these are location score functions, as described in the Introduction, and consequently all described characterizations are location-based. For certain distributions this may lead to somewhat artificial results if, for

Final comments

We hope to have conveyed through the previous examples from very different topics the important message that many characterizations of the normal distribution and, consequently, the seemingly special role of the normal distribution, are (at least to a large degree) to be attributed to the fact that its score function is the identity function which happens to appear in many circumstances. While a general score function of the form p(x)p(x) would immediately hint at a special role played by the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The author would like to thank the Editor, Associate Editor and three anonymous reviewers for insightful comments that helped to improve the present paper.

References (40)

  • A. Azzalini et al.

    On Gauss’s characterization of the normal distribution

    Bernoulli

    (2007)
  • A. Barp et al.

    Minimum Stein discrepancy estimators

    Neural Information Processing Systems

    (2019)
  • S. Betsch et al.

    Testing normality via a distributional fixed point property in the Stein characterization

    TEST

    (2020)
  • T. Cacoullos

    On upper and lower bounds for the variance of a function of a random variable

    The Annals of Probability

    (1982)
  • S. Chatterjee et al.

    Exponential approximation by Stein’s method and spectral graph theory

    ALEA Latin American Journal of Probability and Mathematical Statistics

    (2011)
  • H. Chernoff

    A note on an inequality involving the normal distribution

    The Annals of Probability

    (1981)
  • M. Chiogna

    A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution

    Statistical Methods and Applications

    (2005)
  • M. Duerinckx et al.

    Maximum likelihood characterization of distributions

    Bernoulli

    (2014)
  • M. Ernst et al.

    First order covariance inequalities via Stein’s method

    Bernoulli

    (2020)
  • C.F. Gauss

    Theoria motus corporum coelestium in sectionibus conicis solem ambientium

    (1809)
  • Cited by (0)

    View full text