Abstract
We propose a method of testing a shift between mean vectors of two multivariate Gaussian random variables in a high-dimensional setting incorporating the possible dependency and allowing \(p > n\). This method is a combination of two well-known tests: the Hotelling test and the Simes test. The tests are integrated by sampling several dimensions at each iteration, testing each using the Hotelling test, and combining their results using the Simes test. We prove that this procedure is valid asymptotically. This procedure can be extended to handle non-equal covariance matrices by plugging in the appropriate extension of the Hotelling test. Using a simulation study, we show that the proposed test is advantageous over state-of-the-art tests in many scenarios and robust to violation of the Gaussian assumption.
Similar content being viewed by others
References
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750. https://doi.org/10.1073/pnas.96.12.6745
Bai Z, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Statistica Sinica 6(2):311–329
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188
Bibby J, Kent J, Mardia K (1979) Multivariate analysis
Cai TT, Liu W, Xia Y (2014) Two-sample test of high dimensional means under dependence. J Royal Stat Soc Ser B-Stat Methodol 76(2):349–372. https://doi.org/10.1111/rssb.12034
Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Statist 38(2):808–835. https://doi.org/10.1214/09-AOS716
Derkach A, Lawless JF, Sun L (2014) Pooled association tests for rare genetic variants: a review and some new results. Stat Sci 29(2):302–321. https://doi.org/10.1214/13-STS456
Donoho D, Jin J (2004) Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 32(3):962–994. https://doi.org/10.1214/009053604000000265
Feng L, Zou C, Wang Z, Zhu L (2017) Composite t 2 test for high-dimensional data. Statistica Sinica, pp 1419–1436
Gregory K (2014) highD2pop: two-sample tests for equality of means in high dimension. https://CRAN.R-project.org/package=highD2pop, r package version 1.0
Gregory KB, Carroll RJ, Baladandayuthapani V, Lahiri SN (2015) A two-sample test for equality of means in high dimension. J Am Stat Assoc 110(510):837–849. https://doi.org/10.1080/01621459.2014.934826
Heller R, Heller Y, (2016) Multivariate tests of association based on univariate tests. In: Advances in Neural Information Processing Systems 29 (NIPS, (2016) vol 29, 30th Conference on Neural Information Processing Systems (NIPS). SPAIN, Barcelona, p 2016
Hemmelmann C, Horn M, Reiterer S, Schack B, Susse T, Weiss S (2004) Multivariate tests for the evaluation of high-dimensional EEG data. J Neurosci Methods 139(1):111–120. https://doi.org/10.1016/j.jneumeth.2004.04.013
Hotelling H (1931) The generalization of student’s ratios. Ann Math Stat 2:360–378
Hu Z, Tong T, Genton MG (2019) Diagonal likelihood ratio test for equality of mean vectors in high-dimensional data. Biometrics 75(1):256–267
Jacob L, Neuvial P, Dudoit S (2010) Gains in power from structured two-sample tests of means on graphs. arXiv preprint arXiv:1009.5173
Karlin S, Rinott Y (1981) Total positivity properties of absolute value multinormal variables with applications to confidence interval estimates and related probabilistic inequalities. Ann Stat 9(5):1035–1049. https://doi.org/10.1214/aos/1176345583
Keselman H, Cribbie R, Holland B (2002) Controlling the rate of Type I error over a large set of statistical tests. British J Math Stat Psychol 55(1):27–39. https://doi.org/10.1348/000711002159680
Krishnamoorthy K, Yu J (2004) Modified Nel and Van der Merwe test for the multivariate Behrens-Fisher problem. Stat Probab Lett 66(2):161–169. https://doi.org/10.1016/j.spl.2003.10.012
Läuter J (2013) Simes’ theorem is generally valid for dependent normally distributed variables. In: Invited talk at the international conference on simultaneous inference
Lin L, Pan W (2016) highmean: Two-Sample Tests for High-Dimensional Mean Vectors. https://CRAN.R-project.org/package=highmean, r package version 3.0
Lopes M, Jacob L, Wainwright MJ (2011) A more powerful two-sample test in high dimensions using random projection. In: Advances in Neural Information Processing Systems, pp 1206–1214
Reiner-Benaim A (2007) FDR control by the BH procedure for two-sided correlated tests with implications to gene expression data analysis. Biomet J 49(1):107–126. https://doi.org/10.1002/bimj.200510313, 4th International Conference on Multiple Comparison Procedures (MCP2005), Shanghai, PEOPLES R CHINA, AUG 17-19, 2005
Ruiz-Meana M, Garcia-Dorado D, Pina P, Inserte J, Agullo L, Soler-Soler J (2003) Cariporide preserves mitochondrial proton gradient and delays ATP depletion in cardiomyocytes during ischemic conditions. Am J Physiol Heart Circulat Physiol 285(3):H999–H1006. https://doi.org/10.1152/ajpheart.00035.2003
Sarkar S (1998) Some probability inequalities for ordered MTP2 random variables: a proof of the Simes conjecture. Ann Stat 26(2):494–504
Simes RJ (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73(3):751–754. https://doi.org/10.2307/2336545
Srivastava MS, Du M (2008) A test for the mean vector with fewer observations than the dimension. J Multivariate Anal 99(3):386–402. https://doi.org/10.1016/j.jmva.2006.11.002
Stadje W (1990) The collector’s problem with group drawings. Adv Appl Probab, pp 866–882
Thulin M (2014) A high-dimensional two-sample test for the mean using random subspaces. Comput Stat Data Anal 74:26–38. https://doi.org/10.1016/j.csda.2013.12.003
Williams V, Jones L, Tukey J (1999) Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. J Educat Behav Stat 24(1):42–69. https://doi.org/10.3102/10769986024001042
Xiong M, Zhao J, Boerwinkle E (2002) Generalized T-2 test for genome association studies. Am J Human Genet 70(5):1257–1268. https://doi.org/10.1086/340392
Yang Y, Yamada T, Hill KK, Hemberg M, Reddy NC, Cho HY, Guthrie AN, Oldenborg A, Heiney SA, Ohmae S, Medina JF, Holy TE, Bonni A (2016) Chromatin remodeling inactivates activity genes and regulates neural coding. Science 353(6296):300–305. https://doi.org/10.1126/science.aad4225
Yekutieli D (2008) False discovery rate control for non-positively regression dependent test statistics. J Stat Plan Inference 138(2):405–415. https://doi.org/10.1016/j.jspi.2007.06.006
Acknowledgements
We would like to thank the reviewers for their thoughtful comments and efforts toward improving our manuscript. The research leading to these results has received funding from the European Research Council: ERC grant agreement PSARPS{294519}.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research leading to these results has received funding from the European Research Council: ERC grant agreement PSARPS{294519}.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Frostig, T., Benjamini, Y. Testing the equality of multivariate means when \(p>n\) by combining the Hotelling and Simes tests. TEST 31, 390–415 (2022). https://doi.org/10.1007/s11749-021-00781-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-021-00781-z