When the score function is the identity function - A tale of characterizations of the normal distribution
Introduction
The normal or Gaussian distribution is the most popular probability law in statistics and probability. The reasons for this popularity are manifold, including the nice bell curve shape, the simple form of the densitywith easily interpretable location parameter and scale parameter the ensuing mathematical tractability, the straightforward extension to the multivariate normal density (which we shall however not deal with in this paper) or the fact of being the limit distribution in the Central Limit Theorem. Besides these major appeals, the normal distribution is also famous for satisfying various characterizations, the latter being theoretical results that only one distribution (or one class of distributions) fulfils. Carl Friedrich Gauss himself has obtained the normal density by searching for a probability distribution where the maximum likelihood estimator of the location parameter always (see Section 2.1 for a precise meaning) coincides with the most intuitive estimator, namely the sample average. Numerous other characterizations of this popular distribution have followed, and in general it took the researchers decades to extend them to other distributions, often in an ad hoc way.
For the sake of historical correctness, it is necessary to recall that the “Gaussian distribution” is a perfect example of Stigler’s law of eponymy (Stigler, 1980) because it got first introduced by de Moivre in 1738. For more information, we refer the interested reader to the insightful paper Le Cam (1986).
In the present paper, we will show that an apparently inessential characterization of the normal distribution turns out to be a crucial building block in several more famous characterizations. This characterization is the fact that or, equivalently, . In the former notation we speak of the derivative of the log-density, while the second case features the location score function (we will refer to both settings as the “identity function” or “location score function”). It is straightforward to see that the normal distribution is the only one for which these results hold. We shall show in the remainder of this paper that this particular characterization of the normal distribution via the identity function lies at the core of many characterizations that convey its special role to the normal distribution. We will illustrate this fact by means of 4 totally unrelated examples from the literature, namely the maximum likelihood characterization (Section 2.1), a singular Fisher information matrix characterization within skew-symmetric distributions (Section 2.2), Stein characterizations (Section 2.3) and a characterization related to variance bounds (Section 2.4). In each case, we indicate where the identity function plays its role and how, by replacing it with for some general density the characterization that seemed tailor-made for the normal distribution can in fact be extended to other distributions. For some examples we bring in some innovative viewpoint in the proofs, for others not. We wish to stress that the examples are not the goal of this paper, but they serve the purpose of illustrating the global vision of our paper, namely that recognizing as
allows a better understanding of various characterization results of the normal distribution;
yields a simple tool for extending a result for the normal to virtually any distribution.
The reason why we wish to stress the characterization is because it goes unnoticed. If one were to find in the examples shown in Sections 2.1-2.4, it would be obvious, and not surprising, that the result holds for the density and can be generalized to another density by replacing with . In case of the underlying density is hidden (of course, not every is associated with the normal distribution).
Finally, bearing in mind that the issues we just discussed are based on the location score function, we explain in Section 3 how alternative characterizations can be obtained by rather looking at the scale score function. We conclude the paper with final comments in Section 4.
Section snippets
Maximum likelihood characterization
We call location MLE characterization the characterization of a probability distribution via the structure of the Maximum Likelihood Estimator (MLE) of the location parameter. Gauss (1809) showed that, in a location family with differentiable density the MLE for is the sample mean for all samples of all sample sizes if, and only if, is the normal density. This result has been successively refined in two directions. On the one hand, several authors have
Further extensions by means of the scale score function
So far our extensions of characterization theorems of the normal distribution to any other continuous distribution with density have been based on the score function as natural extension of the normal score function. It is important to keep in mind that these are location score functions, as described in the Introduction, and consequently all described characterizations are location-based. For certain distributions this may lead to somewhat artificial results if, for
Final comments
We hope to have conveyed through the previous examples from very different topics the important message that many characterizations of the normal distribution and, consequently, the seemingly special role of the normal distribution, are (at least to a large degree) to be attributed to the fact that its score function is the identity function which happens to appear in many circumstances. While a general score function of the form would immediately hint at a special role played by the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The author would like to thank the Editor, Associate Editor and three anonymous reviewers for insightful comments that helped to improve the present paper.
References (40)
- et al.
The centred parametrization for the multivariate skew-normal distribution
Journal of Multivariate Analysis
(2008) - et al.
A general class of multivariate skew-elliptical distributions
Journal of Multivariate Analysis
(2001) - et al.
Stein’s method for invariant measures of diffusions via Malliavin calculus
Stochastic Processes and their Applications
(2012) - et al.
On the singularity of multivariate skew-symmetric models
Journal of Multivariate Analysis
(2010) - et al.
A general parametric Stein characterization
Statistics & Probability Letters
(2016) - et al.
Strengthened Chernoff-type variance bounds
Bernoulli
(2014) A class of distributions which includes the normal ones
Scandinavian Journal of Statistics
(1985)- et al.
Statistical applications of the multivariate skew normal distribution
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
(1999) - et al.
Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
(2003) - et al.
The skew-normal and related families.
(2014)