Abstract
Given k independent samples with finite but arbitrary dimension, this paper deals with the problem of testing for the equality of their distributions that can be continuous, discrete or mixed. In contrast to the classical setting where k is assumed to be fixed and the sample size from each population increases without bound, here k is assumed to be large and the size of each sample is either bounded or small in comparison with k. The asymptotic distribution of two test statistics is stated under the null hypothesis of the equality of the k distributions as well as under alternatives, which let us to study the asymptotic power of the resulting tests. Specifically, it is shown that both test statistics are asymptotically free distributed under the null hypothesis. The finite sample performance of the tests based on the asymptotic null distribution is studied via simulation. An application of the proposal to a real data set is included. The use of the proposed procedure for infinite dimensional data, as well as other possible extensions, are discussed.
Similar content being viewed by others
References
Agrawal A, Catalini C, Goldfarb A (2014) Some simple economics of crowdfunding. Innov Policy Econ 14:63–97
Alba-Fernández MV, Jiménez-Gamero MD, Muñoz-García J (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52:3730–3748
Alba-Fernández MV, Batsidis A, Jiménez-Gamero MD, Jodrá P (2017) A class of tests for the two-sample problem for count data. J Comput Appl Math 318:220–229
Anderson NH, Hall P, Titterington DM (1994) Two-sample tests for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J Multivar Anal 50:41–54
Bárcenas R, Ortega J, Quiroz AJ (2017) Quadratic forms of the empirical processes for the two-sample problem for functional data. Test 26:503–526
Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206
Baringhaus L, Kolbe D (2015) two-sample tests based on empirical Hankel trasforms. Stat Pap 56:597–617
Cousido-Rocha M, de Uña-Álvarez J, Hart JD (2019) Testing equality of a large number of densities under mixing conditions. Test 28:1203–1228
Cuesta-Albertos JA, Fraiman R, Ransford T (2006) Random projections and goodness-of-fit tests in infinite-dimensional spaces. Bull Braz Math Soc 37:1–25
Hall P, Van Keilegom I (2007) Two-sample tests in functional data analysis starting from discrete data. Stat Sin 17:1511–1531
Henze N, Jiménez-Gamero MD (2020) A test for Gaussianity in Hilbert spaces via the empirical characteristic functional. Scand J Stat. https://doi.org/10.1111/sjos.12470
Hušková M, Meintanis SG (2008) Tests for the multivariate \(k\)-sample problem based on the empirical characteristic function. J Nonparametr Stat 20:263–277
Jammalamadaka SR, Jiménez-Gamero MD, Meintanis SG (2019) A class of goodness-of-fit tests for circular distributions based on trigonometric moments. SORT 43:317–336
Jiang Q, Hušková M, Meintanis SG, Zhu L (2019) Asymptotics, finite-sample comparisons and applications for two-sample tests with functional data. J Multivar Anal 170:202–220
Jiménez-Gamero MD, Alba-Fernández MV, Jodrá P, Barranco-Chamorro I (2017) Fast tests for the two-sample problem based on the empirical characteristic function. Math Comput Simul 137:390–410
Kiefer J (1959) \(k\)-sample analogues of the Kolmogorov–Smirnov and Cramer-V. Mises Tests Ann Math Stat 30:420–447
Laha RG, Rohatgi VK (1979) Probability theory. Wiley, New York
Martínez-Camblor P, de Uña-Álvarez J (2009) Non-parametric k-sample tests: density functions vs distribution functions. Comput Stat Data Anal 53:3344–57
Mollick E (2014) The dynamics of crowdfunding: an exploratory study. J Bus Ventur 29:1–16
Pardo-Fernández JC, Jiménez-Gamero MD (2019) Testing for the conditional variance in nonparametric regression models. AStA Adv Stat Anal 103:387–410
Pardo-Fernández JC, Jiménez-Gamero MD, El Ghouch A (2015) A nonparametric ANOVA-type test for regression curves based on characteristic functions. Scand J Stat 42:197–213
Rivas-Martínez GI, Jiménez-Gamero MD, Moreno Rebollo JL (2019) A two-sample test for the error distribution in nonparametric regression. Stat Pap 60:1369–1395
Zhan D, Hart JD (2014) Testing equality of a large number of densities. Biometrika 101:449–464
Acknowledgements
The authors thank the Associate Editor and two anonymous referees for their constructive comments and suggestions which helped to improve the presentation. M.D. Jiménez-Gamero has been partially supported by Grants MTM2017-89422-P (Spanish Ministry of Economy, Industry and Competitiveness, the State Agency of Investigation, the European Regional Development Fund) and P18-FR-2369 (Junta de Andalucía). M. Cousido-Rocha has received financial support of SiDOR research group through the Grant Competitive Reference Group, 2016-2019 (ED431C 2016/040), funded by the Consellería de Cultura, Educación e Ordenación Universitaria, Xunta de Galicia, also by Grant MTM2017-89422-P. M.V. Alba-Fernández and F. Jiménez-Jiménez acknowledge the financial support provided by the Grant EI_SEJ5_2019 (Universidad de Jaén).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Jiménez-Gamero, M.D., Cousido-Rocha, M., Alba-Fernández, M.V. et al. Testing the equality of a large number of populations. TEST 31, 1–21 (2022). https://doi.org/10.1007/s11749-021-00769-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-021-00769-9