Measuring and testing homogeneity of distributions by characteristic distance

Li, Xu; Hu, Wenjuan; Zhang, Baoxue

doi:10.1007/s00362-022-01327-7

Measuring and testing homogeneity of distributions by characteristic distance

Regular Article
Published: 13 June 2022

Volume 64, pages 529–556, (2023)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Xu Li¹,
Wenjuan Hu¹ &
Baoxue Zhang¹

330 Accesses
Explore all metrics

Abstract

Technological advances have enabled us to collect a lot of complex data objects, where homogeneity structure among these objects is widely used in Statistics. However, the existing metrics of homogeneity are subject to some qualifications, such as assumptions about the moment and parameters. To overcome the limitation, this paper first introduces the characteristic distance, a novel metric that entirely characterizes the homogeneity of two distributions. The proposed distance possesses some desirable statistical properties: (i) It is a distribution-free or, more commonly, nonparametric test, thus is robust to the data; (ii) It is nonnegative and equal to zero if and only if the two distributions are homogeneous; (iii) The novel measure possesses a clear and intuitive probabilistic interpretation, moreover, its empirical version is easy to calculate and can be reduced to a sum of two V-statistics. Theoretically, the asymptotic distributions, including the mixture of $\chi ^{2}$ distributions under the null hypothesis and the asymptotic normality of the alternative hypothesis are thoroughly investigated. Simulation studies and a real data application suggest that the empirical characteristic distance has a preferable power in detecting the homogeneity of distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Some Statistical Tests Based on $$\mathfrak{N}$$ -Distances

Multivariate tests of uniformity

Article 22 September 2015

Mengta Yang & Reza Modarres

Confidence Intervals for Common Variance of Normal Distributions

References

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300
MathSciNet MATH Google Scholar
Bickel PJ (1969) A distribution free version of the Smirnov two sample test in the p-variate case. Ann Math Stat 40(1):1–23
Article MathSciNet MATH Google Scholar
Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171
Article MathSciNet MATH Google Scholar
Chakraborty S, Zhang X (2021) A new framework for distance and kernel-based metrics in high dimensions. Electron J Stat 15(2):5455–5522
Article MathSciNet MATH Google Scholar
Chung J, Fraser D (1958) Randomization tests for a multivariate two-sample problem. J Am Stat Assoc 53(283):729–735
Article MATH Google Scholar
Fernández VA, Gamero MJ, Garcia JM (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52(7):3730–3748
Article MathSciNet MATH Google Scholar
Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717
Article MathSciNet MATH Google Scholar
Gentleman R, Irizarry RA, Carey VJ, Dudoit S, Huber W (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York
Book MATH Google Scholar
Gretton A, Borgwardt KM, Rasch M, Schölkopf B, Smola AJ (2007) A kernel method for the two-sample-problem. Adv Neural Inf Process Syst 19:513–520
MATH Google Scholar
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola AJ (2012) A kernel two-sample test. J Mach Learn Res 13:723–773
MathSciNet MATH Google Scholar
Harchaoui Z, Bach F, Cappe O, Moulines E (2013) Kernel-based methods for hypothesis testing: a unified view. IEEE Signal Process Mag 30(4):87–97
Article Google Scholar
Kim I, Balakrishnan S, Wasserman L (2020) Robust multivariate nonparametric tests via projection averaging. Ann Stat 48(6):3417–3441
Article MathSciNet MATH Google Scholar
Koroljuk VS, Borovskich YV (1994) Theory of U-statistics. Kluwer Academic Publisher, Amsterdam
Book Google Scholar
Lee AJ (1990) U-statistics: theory and practice statistics: textbooks and monographs 110. Dekker Inc., New York
Google Scholar
Lee D, Lahiri SN, Sinha S (2020) A test of homogeneity of distributions when observations are subject to measurement errors. Biometrics 76(3):821–833
Article MathSciNet MATH Google Scholar
Neuhaus G (1977) Functional limit theorems for U-statistics in the degenerate case. J Multivariate Anal 7:424–439
Article MathSciNet MATH Google Scholar
Pan W, Tian Y, Wang X, Zhang H (2018) Ball divergence: nonparametric two sample test. Ann Stat 46(3):1109–1137
Article MathSciNet MATH Google Scholar
Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann Stat 41(5):2263–2291
Article MathSciNet MATH Google Scholar
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York
Book MATH Google Scholar
Smirnoff N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bulletin de lUniversite de Moscow Serie internationale (Mathematiques) 2:3–14
MathSciNet MATH Google Scholar
Székely GJ (2002) E-statistics: the energy of statistical samples. Technical report
Székely GJ, Rizzo ML (2004) Testing for equal distributions in high dimension. InterStat 5:1–8
Google Scholar
Wald A, Wolfowitz J (1940) On a test whether two samples are from the same population. Ann Math Stat 11(2):147–162
Article MathSciNet MATH Google Scholar
Xiaochun L (2009) ALL: A data package. R package version 1.22.0
Yiming L, Zhi L, Wang Z (2019) A test for equality of two distributions via integrating characteristic functions. Stat Sin 29(4):1779–1801
MathSciNet MATH Google Scholar
Zhi L, Xiaochao X, Wang Z (2015) A test for equality of two distributions via jackknife empirical likelihood and characteristic functions. Comput Stat Data Anal 92:97–114
Article MathSciNet MATH Google Scholar
Zhu C, Shao X (2021) Interpoint distance based two sample tests in high dimension. Bernoulli 27(2):1189–1211
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant No. 12071267).

Author information

Authors and Affiliations

School of Statistics, Capital University of Economics and Business, Fengtai, 100070, Beijing, China
Xu Li, Wenjuan Hu & Baoxue Zhang

Authors

Xu Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenjuan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Baoxue Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baoxue Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Hu, W. & Zhang, B. Measuring and testing homogeneity of distributions by characteristic distance. Stat Papers 64, 529–556 (2023). https://doi.org/10.1007/s00362-022-01327-7

Download citation

Received: 21 October 2021
Revised: 19 March 2022
Accepted: 13 May 2022
Published: 13 June 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00362-022-01327-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring and testing homogeneity of distributions by characteristic distance

Abstract

Access this article

Similar content being viewed by others

Some Statistical Tests Based on $$\mathfrak{N}$$ -Distances

Multivariate tests of uniformity

Confidence Intervals for Common Variance of Normal Distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring and testing homogeneity of distributions by characteristic distance

Abstract

Access this article

Similar content being viewed by others

Some Statistical Tests Based on $$\mathfrak{N}$$ -Distances

Multivariate tests of uniformity

Confidence Intervals for Common Variance of Normal Distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation