Distance-Based Statistical Inference

Marianthi Markatou; Dimitrios Karlis; Yuxin Ding

doi:10.1146/annurev-statistics-031219-041228

Annual Review of Statistics and Its Application

Volume 8, 2021

Review Article

Free

Distance-Based Statistical Inference

Marianthi Markatou¹, Dimitrios Karlis², and Yuxin Ding¹
View Affiliations Hide Affiliations

Affiliations: ¹Department of Biostatistics, School of Public Health and Health Professions, University at Buffalo, Buffalo, New York 14214, USA; email: [email protected] ²Department of Statistics, Athens University of Economics and Business, Athens 10434, Greece
Vol. 8:301-327 (Volume publication date March 2021) https://doi.org/10.1146/annurev-statistics-031219-041228
First published as a Review in Advance on September 30, 2020
Copyright © 2021 by Annual Reviews. All rights reserved

Abstract

Statistical distances, divergences, and similar quantities have an extensive history and play an important role in the statistical and related scientific literature. This role shows up in estimation, where we often use estimators based on minimizing a distance. Distances also play a prominent role in hypothesis testing and in model selection. We review the statistical properties of distances that are often used in scientific work, present their properties, and show how they compare to each other. We discuss an approximation framework for model-based inference using statistical distances. Emphasis is placed on identifying in what sense and which statistical distances can be interpreted as loss functions and used for model assessment. We review a special class of distances, the class of quadratic distances, connect it with the classical goodness-of-fit paradigm, and demonstrate its use in the problem of assessing model fit. These methods can be used in analyzing very large samples.

Keyword(s): goodness-of-fit, model assessment, quadratic distances, robustness, statistical distances, statistical machine learning

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031219-041228

2021-03-07

2024-04-18

Full text loading...

/deliver/fulltext/statistics/8/1/annurev-statistics-031219-041228.html?itemId=/content/journals/10.1146/annurev-statistics-031219-041228&mimeType=html&fmt=ahah

Literature Cited

Akaike H. 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–23
[Google Scholar]
Ali SM, Silvey SD. 1966. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. B 28:131–42
[Google Scholar]
Anderson NH, Hall P, Titterington DM 1994. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J. Multivar. Anal. 50:41–54
[Google Scholar]
Aslan B, Zech G. 2002. A new class of binning free, multivariate goodness-of-fit tests: the energy tests. arXiv:hep-ex/0203010
Balasubramanian K, Li T, Yuan M 2017. On the optimality of kernel-embedding based goodness-of-fit tests. arXiv:1709.08148v1 [stat.ML]
Basu A, Harris IR, Hjort NL, Jones MC 1998. Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–59
[Google Scholar]
Basu A, Lindsay BG. 1994. Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Stat. Math. 46:683–705
[Google Scholar]
Basu A, Mandal A, Pardo L 2010. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80:206–14
[Google Scholar]
Basu A, Shioya H, Park C 2011. Statistical Inference: The Minimum Distance Approach Boca Raton, FL: CRC Press
Beran R. 1977. Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5:445–63
[Google Scholar]
Berkson J. 1980. Minimum chi-square, not maximum likelihood. ! Ann. Stat. 8:457–87
[Google Scholar]
Bickel PJ, Rosenblatt M. 1973. On some global measures of the deviations of density function estimates. Ann. Stat. 1:1071–95
[Google Scholar]
Blume JD. 2002. Likelihood methods for measuring statistical evidence. Stat. Med. 21:2563–99
[Google Scholar]
Bowman AW. 1992. Density based tests for goodness-of-fit. J. Stat. Comput. Simul. 40:1–3
[Google Scholar]
Bowman AW, Foster PJ. 1993. Adaptive smoothing and density-based tests of multivariate normality. J. Am. Stat. Assoc. 88:529–37
[Google Scholar]
Burnham KP, Anderson DR. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach New York: Springer-Verlag. , 2nd. ed.
Chen Y, Markatou M. 2020. Kernel tests for one, two, and k-sample goodness-of-fit: state of the art and implementation considerations. Statistical Modeling in Biomedical Research: Emerging Topics in Statistics and Biostatistics Y Zhao, DG Chen 309–37 New York: Springer
[Google Scholar]
Chwialkowski K, Strathmann H, Gretton A 2016. A kernel test of goodness of fit. PMLR 48:2606–15
[Google Scholar]
Commenges D, Sayyareh A, Letenneur L, Guedj J, Bar-Hen A 2008. Estimating a difference of Kullback–Leibler risks using a normalized difference of AIC. Ann. Appl. Stat. 2:1123–42
[Google Scholar]
Cover TM, Thomas JA. 2012. Elements of Information Theory New York: Wiley. , 2nd. ed.
Cressie N, Read TRC. 1984. Multinomial goodness-of-fit tests. J. R. Stat. Soc. B 46:440–64
[Google Scholar]
Csiszár I. 1967. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2:299–318
[Google Scholar]
Cutler A, Coredro-Brana OI. 1996. Minimum Hellinger distance estimation for finite mixture models. J. Am. Stat. Assoc. 91:1716–23
[Google Scholar]
Davies PL. 1995. Data features. Stat. Neerl. 49:185–245
[Google Scholar]
Donoho DL, Liu RC. 1988. Pathologies of some minimum distance estimators. Ann. Stat. 16:587–608
[Google Scholar]
Edelmann D, Fokianos K, Pitsillou M 2019. An updated literature review of distance correlation and its applications to time series. Int. Stat. Rev. 87:237–62
[Google Scholar]
Fan Y. 1997. Goodness-of-fit tests for a multivariate distribution by the empirical characteristic function. J. Multivar. Anal. 62:36–63
[Google Scholar]
Fan Y. 1998. Goodness-of-fit tests based on kernel density estimators with fixed smoothing parameters. Econom. Theory 14:604–21
[Google Scholar]
Gaißer S, Ruppert M, Schmid F 2010. A multivariate version of Hoeffding's phi-square. J. Multivar. Anal. 101:2571–86
[Google Scholar]
Ghosh A, Basu A. 2018. A new family of divergences originating from model adequacy tests and application to robust statistical inference. IEEE Trans. Inf. Theory 64:5581–91
[Google Scholar]
Giet L, Lubrano M. 2008. A minimum Hellinger distance estimator for stochastic differential equations: an application to statistical inference for continuous time interest rate models. Comput. Stat. Data Anal. 52:2945–65
[Google Scholar]
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A 2012. A kernel two-sample test. J. Mach. Learn. Res. 13:723–73
[Google Scholar]
Gretton A, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M et al. 2012. Optimal kernel choice for large-scale two-sample tests. Advances in Neural Information Processing Systems 25 (NIPS 2012) F Pereira, CJC Burges, L Bottou, KQ Weinberger 1205–13 Red Hook, NY: Curran
[Google Scholar]
Hampel FR. 1968. Contribution to the theory of robust estimation PhD Thesis, Univ. Calif Berkeley:
Hampel FR. 1974. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 69:383–93
[Google Scholar]
Havrda J, Charvát F. 1967. Quantification method of classification processes: concept of structural α-entropy. Kybernetika 3:30–35
[Google Scholar]
Hodges JL Jr., Lehmann EL. 1954. Testing the approximate validity of statistical hypotheses. J. R. Stat. Soc. B 16:261–68
[Google Scholar]
Huo X, Székely GJ. 2016. Fast computing for distance covariance. Technometrics 58:435–47
[Google Scholar]
Hušková M, Meintanis SG. 2008. Tests for the multivariate k-sample problem based on the empirical characteristic function. J. Nonparametric Stat. 20:263–77
[Google Scholar]
Kallenberg WC, Oosterhoff J, Schriever BF 1985. The number of classes in chi-squared goodness-of-fit tests. J. Am. Stat. Assoc. 80:959–68
[Google Scholar]
Kateri M. 2018. ϕ-Divergence in contingency table analysis. Entropy 20:324
[Google Scholar]
Klar B, Meintanis SG. 2005. Tests for normal mixtures based on the empirical characteristic function. Comput. Stat. Data Anal. 49:227–42
[Google Scholar]
Klebanov LB. 2005. N-distances and Their Applications Chicago: Univ. Chicago Press
Klebanov LB, Gordon A, Xiao Y, Land H, Yakovlev A 2006. A permutation test motivated by microarray data analysis. Comput. Stat. Data Anal. 50:3619–28
[Google Scholar]
Kullback S, Leibler RA. 1951. On information and sufficiency. Ann. Math. Stat. 22:79–86
[Google Scholar]
Lin N, He X. 2006. Robust and efficient estimation under data grouping. Biometrika 93:99–112
[Google Scholar]
Lindsay BG. 1994. Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Stat. 22:1081–114
[Google Scholar]
Lindsay BG. 2004. Statistical distances as loss functions in assessing model adequacy. The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations ML Taper, SR Lele 439–88 Chicago: Univ. Chicago Press
[Google Scholar]
Lindsay BG, Liu J. 2009. Model assessment tools for a model false world. Stat. Sci. 24:303–18
[Google Scholar]
Lindsay BG, Markatou M, Ray S 2014. Kernels, degrees of freedom, and power properties of quadratic distance goodness-of-fit tests. J. Am. Stat. Assoc. 109:395–410
[Google Scholar]
Lindsay BG, Markatou M, Ray S, Yang K, Chen SC 2008. Quadratic distances on probabilities: a unified foundation. Ann. Stat. 36:983–1006
[Google Scholar]
Liu J, Lindsay BG. 2009. Building and using semiparametric tolerance regions for parametric multinomial models. Ann. Stat. 37:3644–59
[Google Scholar]
Liu ZJ, Rao CR. 1995. Asymptotic distribution of statistics based on quadratic entropy and bootstrapping. J. Stat. Plan. Inference 43:1–18
[Google Scholar]
Markatou M, Chen Y. 2018. Non-quadratic distances in model assessment. Entropy 20:464
[Google Scholar]
Markatou M, Chen Y, Afendras G, Lindsay BG 2017. Statistical distances and their role in robustness. New Advances in Statistics and Data Science DG Chen, Z Jin, G Li, Y Li, A Liu, Y Zhao 3–26 New York: Springer
[Google Scholar]
Markatou M, Liu RC. 2019. Distance-based model assessment in continuous parametric models Tech. Rep., Dep. Biostat., SUNY Buffalo, NY:
Markatou M, Sofikitou EM. 2019. Statistical distances and the construction of evidence functions for model adequacy. Front. Ecol. Evol. 7:447
[Google Scholar]
Meintanis SG, Swanepoel J, Allison J 2014. The probability weighted characteristic function and goodness-of-fit testing. J. Stat. Plan. Inference 146:122–32
[Google Scholar]
Panaretos VM, Zemel Y. 2019. Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6:405–31
[Google Scholar]
Pardo L. 2006. Statistical Inference Based on Divergence Measures Boca Raton, FL: Chapman & Hall/CRC
Pardo L. 2019. New developments in statistical information theory based on entropy and divergence measures. Entropy 21:391
[Google Scholar]
Póczos B, Ghahramani Z, Schneider J 2012. Copula-based kernel dependency measures. Proceedings of the 29th International Conference on Machine Learning J Langford, J Pineau 775–82 Madison, WI: Omnipress
[Google Scholar]
Rachev ST. 1991. Probability Metrics and the Stability of Stochastic Models New York: Wiley
Ramdas A, Reddi SJ, Póczos B, Singh A, Wasserman L 2015. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence3571–77 Palo Alto, CA: AAAI
[Google Scholar]
Rao CR. 1982. Diversity: its measurement, decomposition, apportionment and analysis. Sankhya A 44:1–22
[Google Scholar]
Rao CR. 1984. Convexity properties of entropy functions and analysis of diversity. Inequalities in Statistics and Probability YL Tong 68–77 Hayward, CA: IMS
[Google Scholar]
Rao CR. 2010. Quadratic entropy and analysis of diversity. Sankhya A 72:70–80
[Google Scholar]
Rao CR, Nayak T. 1985. Cross entropy, dissimilarity measures, and characterizations of quadratic entropy. IEEE Trans. Inf. Theory 31:589–93
[Google Scholar]
Ray S, Lindsay BG. 2008. Model selection in high dimensions: a quadratic-risk-based approach. J. R. Stat. Soc. B 70:95–118
[Google Scholar]
Read TRC, Cressie NA. 1988. Goodness-of-Fit Statistics for Discrete Multivariate Data New York: Springer-Verlag
Rényi A. 1967. On some basic problems of statistics from the point of view of information theory. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability LM Le Cam, J Neyman 531–43 Berkeley: Univ. Calif. Press
[Google Scholar]
Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41:2263–91
[Google Scholar]
Shannon CE. 1948. A mathematical theory of communication. Bell Syst. Technol. J. 27:379–423
[Google Scholar]
Simpson DG. 1987. Minimum Hellinger distance estimation for the analysis of count data. J. Am. Stat. Assoc. 82:802–7
[Google Scholar]
Simpson DG. 1989. Hellinger deviance tests: efficiency, breakdown points, and examples. J. Am. Stat. Assoc. 84:107–13
[Google Scholar]
Smola A, Gretton A, Song L, Schölkopf B 2007. A Hilbert space embedding for distributions. Algorithmic Learning Theory: 18th International Conference, ALT 2007 M Hutter, RA Servedio, E Takimoto 13–31 New York: Springer
[Google Scholar]
Stigler SM. 2005. Fisher in 1921. Stat. Sci. 1:32–49
[Google Scholar]
Stigler SM. 2007. The epic story of maximum likelihood. Stat. Sci. 22:598–620
[Google Scholar]
Szabo A, Boucher K, Carroll WL, Klebanov LB, Tsodikov AD, Yakovlev AY 2002. Variable selection and pattern recognition with gene expression data generated by the microarray technology. Math. Biosci. 176:71–98
[Google Scholar]
Szabo A, Boucher K, Jones D, Tsodikov AD, Klebanov LB, Yakovlev AY 2003. Multivariate exploratory tools for microarray data analysis. Biostatistics 4:555–67
[Google Scholar]
Székely GJ, Rizzo ML. 2009. Brownian distance covariance. Ann. Appl. Stat. 1:1236–65
[Google Scholar]
Székely GJ, Rizzo ML. 2014. Partial distance correlation with methods for dissimilarities. Ann. Stat. 42:2382–412
[Google Scholar]
Székely GJ, Rizzo ML. 2017. The energy of data. Annu. Rev. Stat. Appl. 4:447–79
[Google Scholar]
Székely GJ, Rizzo ML, Bakirov NK 2007. Measuring and testing dependence by correlation of distances. Ann. Stat. 35:2769–94
[Google Scholar]
Tenreiro C. 2005. On the role played by the fixed bandwidth in the Bickel-Rosenblatt goodness-of-fit test. SORT 29:201–16
[Google Scholar]
Tenreiro C. 2009. On the choice of the smoothing parameter for the BHEP goodness-of-fit test. Comput. Stat. Data Anal. 53:1038–53
[Google Scholar]
Yang J, Liu Q, Rao V, Neville J 2018. Goodness-of-fit testing for discrete distributions via Stein discrepancy. PMLR 80:5561–70
[Google Scholar]
Xi L, Lindsay BG. 1996. A note on calculating the π^* index of fit for the analysis of contingency tables. Sociol. Methods Res. 25:248–59
[Google Scholar]
Zhu S, Chen B, Yang P 2019. Universal hypothesis testing with kernels: asymptotically optimal tests for goodness of fit. PMLR 89:1032–41
[Google Scholar]

/content/journals/10.1146/annurev-statistics-031219-041228

Distance-Based Statistical Inference

Annual Review of Statistics and Its Application 8, 301 (2021); https://doi.org/10.1146/annurev-statistics-031219-041228

/content/journals/10.1146/annurev-statistics-031219-041228

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 8, 2021

Review Article

Free

Distance-Based Statistical Inference

Abstract

Most Read This Month

Most Cited Most Cited RSS feed