Skip to main content
Log in

Measuring and testing homogeneity of distributions by characteristic distance

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Technological advances have enabled us to collect a lot of complex data objects, where homogeneity structure among these objects is widely used in Statistics. However, the existing metrics of homogeneity are subject to some qualifications, such as assumptions about the moment and parameters. To overcome the limitation, this paper first introduces the characteristic distance, a novel metric that entirely characterizes the homogeneity of two distributions. The proposed distance possesses some desirable statistical properties: (i) It is a distribution-free or, more commonly, nonparametric test, thus is robust to the data; (ii) It is nonnegative and equal to zero if and only if the two distributions are homogeneous; (iii) The novel measure possesses a clear and intuitive probabilistic interpretation, moreover, its empirical version is easy to calculate and can be reduced to a sum of two V-statistics. Theoretically, the asymptotic distributions, including the mixture of \(\chi ^{2}\) distributions under the null hypothesis and the asymptotic normality of the alternative hypothesis are thoroughly investigated. Simulation studies and a real data application suggest that the empirical characteristic distance has a preferable power in detecting the homogeneity of distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Bickel PJ (1969) A distribution free version of the Smirnov two sample test in the p-variate case. Ann Math Stat 40(1):1–23

    Article  MathSciNet  MATH  Google Scholar 

  • Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171

    Article  MathSciNet  MATH  Google Scholar 

  • Chakraborty S, Zhang X (2021) A new framework for distance and kernel-based metrics in high dimensions. Electron J Stat 15(2):5455–5522

    Article  MathSciNet  MATH  Google Scholar 

  • Chung J, Fraser D (1958) Randomization tests for a multivariate two-sample problem. J Am Stat Assoc 53(283):729–735

    Article  MATH  Google Scholar 

  • Fernández VA, Gamero MJ, Garcia JM (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52(7):3730–3748

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717

    Article  MathSciNet  MATH  Google Scholar 

  • Gentleman R, Irizarry RA, Carey VJ, Dudoit S, Huber W (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York

    Book  MATH  Google Scholar 

  • Gretton A, Borgwardt KM, Rasch M, Schölkopf B, Smola AJ (2007) A kernel method for the two-sample-problem. Adv Neural Inf Process Syst 19:513–520

    MATH  Google Scholar 

  • Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola AJ (2012) A kernel two-sample test. J Mach Learn Res 13:723–773

    MathSciNet  MATH  Google Scholar 

  • Harchaoui Z, Bach F, Cappe O, Moulines E (2013) Kernel-based methods for hypothesis testing: a unified view. IEEE Signal Process Mag 30(4):87–97

    Article  Google Scholar 

  • Kim I, Balakrishnan S, Wasserman L (2020) Robust multivariate nonparametric tests via projection averaging. Ann Stat 48(6):3417–3441

    Article  MathSciNet  MATH  Google Scholar 

  • Koroljuk VS, Borovskich YV (1994) Theory of U-statistics. Kluwer Academic Publisher, Amsterdam

    Book  Google Scholar 

  • Lee AJ (1990) U-statistics: theory and practice statistics: textbooks and monographs 110. Dekker Inc., New York

    Google Scholar 

  • Lee D, Lahiri SN, Sinha S (2020) A test of homogeneity of distributions when observations are subject to measurement errors. Biometrics 76(3):821–833

    Article  MathSciNet  MATH  Google Scholar 

  • Neuhaus G (1977) Functional limit theorems for U-statistics in the degenerate case. J Multivariate Anal 7:424–439

    Article  MathSciNet  MATH  Google Scholar 

  • Pan W, Tian Y, Wang X, Zhang H (2018) Ball divergence: nonparametric two sample test. Ann Stat 46(3):1109–1137

    Article  MathSciNet  MATH  Google Scholar 

  • Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann Stat 41(5):2263–2291

    Article  MathSciNet  MATH  Google Scholar 

  • Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Smirnoff N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bulletin de lUniversite de Moscow Serie internationale (Mathematiques) 2:3–14

    MathSciNet  MATH  Google Scholar 

  • Székely GJ (2002) E-statistics: the energy of statistical samples. Technical report

  • Székely GJ, Rizzo ML (2004) Testing for equal distributions in high dimension. InterStat 5:1–8

    Google Scholar 

  • Wald A, Wolfowitz J (1940) On a test whether two samples are from the same population. Ann Math Stat 11(2):147–162

    Article  MathSciNet  MATH  Google Scholar 

  • Xiaochun L (2009) ALL: A data package. R package version 1.22.0

  • Yiming L, Zhi L, Wang Z (2019) A test for equality of two distributions via integrating characteristic functions. Stat Sin 29(4):1779–1801

    MathSciNet  MATH  Google Scholar 

  • Zhi L, Xiaochao X, Wang Z (2015) A test for equality of two distributions via jackknife empirical likelihood and characteristic functions. Comput Stat Data Anal 92:97–114

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu C, Shao X (2021) Interpoint distance based two sample tests in high dimension. Bernoulli 27(2):1189–1211

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant No. 12071267).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baoxue Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Hu, W. & Zhang, B. Measuring and testing homogeneity of distributions by characteristic distance. Stat Papers 64, 529–556 (2023). https://doi.org/10.1007/s00362-022-01327-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-022-01327-7

Keywords

Navigation