Abstract
Two sample mean vectors comparison hypothesis testing problems often emerge in modern biostatistics. Many tests are proposed for detecting relatively dense signals with somewhat dense nonzero components in mean vectors differences. One kind of these tests is based on some quadratic forms about two sample mean vectors differences. Another kind of these tests is based on some quadratic forms about studentized version of two sample mean vectors differences. In this article, we propose a bootstrap test by adopting stationary bootstrap scheme to calculate p value of a typical test which is based on a quadratic form about studentized version of two sample mean vectors differences. Extensive simulations are conducted to compare performances of the bootstrap test with other existing typical tests. We also apply the bootstrap test to a real genetic data analysis about breast cancer.
Similar content being viewed by others
References
Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329
Bentkus V (1986) Dependence of the Berry–Esseen estimate on the dimension. Litovsk Matematicheskii Sbornik 26:205–210 (in Russian)
Bentkus V (2003) On the dependence of the Berry–Esseen bound on dimension. J Stat Plan Inference 113:385–402
Bhattacharya R (1975) On the errors of normal approximation. Ann Probab 3:815–828
Bilodeau M, Brenner D (1999) Theory of multivariate statistics. Springer, New York
Brockwell P, Davis R (2009) Time series: theory and methods, Springer series in statistics. Springer, New York
Cai TT, Liu W, Xia Y (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc 108:265–277
Cai TT, Liu W, Xia Y (2014) Two-sample test of high dimensional means under dependence. J R Stat Soc (Ser B) 76:349–372
Chang J, Zheng C, Zhou WX, Zhou W (2017) Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics 73(4):1300–1310
Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835
Chernozhukov V, Chetverikov D, Kato K (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of highdimensional random vectors. Ann Stat 41:2786–2819
Chernozhukov V, Chetverikov D, Kato K (2017) Central limit theorems and bootstrap in high dimensions. Ann Stat 45(4):2309–2352
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Gravier E, Pierron G, Vincent-Salomon A, Gruel N, Raynal V, Savignoni A, De Rycke Y, Pierga JY, Lucchesi C, Reyal F, Fourquet A, Roman-Roman S, Radvanyi F, Sastre-Garau X, Asselain B, Delattre O (2010) A prognostic DNA signature for T1T2 node-negative breast cancer patients. Genes Chromosom Cancer 49(12):1125–1134
Gregory KB, Carroll RJ, Baladandayuthapani V, Lahiri SN (2015) A two-sample test for equality of means in high dimension. J Am Stat Assoc 110(510):837–849
Kunsch HR (1989) The jackknife and the bootstrap for general stationary observations. Ann Stat 17:1217–1241
Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: LePage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York
Nagaev S (1976) An estimate of the remainder term in the multidimensional central limit theorem. In: Proceedings of the third Japan-USSR symposium on probability theory, vol 550, pp 419–438
Politis DN, Romano JP (1994) The stationary bootstrap. J Am Stat Assoc 89:1303–1313
Politis DN, Romano JP (1995) Bias-corrected nonparametric spectral estimation. J Time Ser Anal 16:67–104
Sazonov V (1968) On the multi-dimensional central limit theorem. Sankhya Ser A 30:181–204
Sazonov V (1981) Normal approximations: some recent advances. Lecture notes in mathematics, vol 879. Springer, Berlin
Senatov V (1980) Several estimates of the rate of convergence in the multidimensional CLT. Doklady Akademii nauk Soiuza Sovetskikh Sotsialisticheskikh Respublik 254:809–812
Srivastava MS, Katayarna S, Kano Y (2013) A two sample test in high di-mensional data. J Multivar Anal 114:349–358
Sweeting T (1977) Speed of convergence for the multidimensional central limit theorem. Ann Probab 5:28–41
White H, Politis DN (2004) Automatic block-length selection for the dependent bootstrap. Econ Rev 23:53–70
Acknowledgements
Research of Z. Li and L. Zeng is partially supported by the self-determined research funds of CCNU from the colleges basic research of MOE (CCNU20TS002). Research of F. Liu is partially supported by Humanity and Social Science foundation of MOE of China (17YJA630066) and China Natural Science Fund (11601267).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Proof of Theorem 1
Proof
For any p-dimension real vector \(\mathbf{l }=(l_1,l_2,\ldots ,l_p)'_{p\times 1}\) with \(\vert \vert \mathbf{l }\vert \vert ^2_2=\sum \nolimits _{s=1}^p l_s^2<M_p\), Denote \(f_X(t)\) character function of one-dimension random variable \( \mathbf{l }' \mathbf{X } =\sum \nolimits ^p_{q=1}l_qX_q\), and \(f_Y(t)\) character function of one-dimension random variable \( \mathbf{l }' \mathbf{Y } =\sum \nolimits ^p_{q=1}l_qY_q\). For any positive integer p, there exists a positive real number \(\alpha \), conditions \(\frac{\mathbf{l }'\Sigma _1\mathbf{l }}{p^\alpha }<C\) and \( \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{p^\alpha }<C \) are satisfied, where C is a finite positive constant for any p. Then ,we can get
and
Characteristic function of \(\left( \frac{(m+n)^{1/2}({{\bar{X}}}_1 -\bar{Y}_1)}{p^{\alpha /2}}+\cdots +\frac{(m+n)^{1/2}({{\bar{X}}}_p -\bar{Y}_p)}{p^{\alpha /2}}\right) \) is
Under null hypothesis,as \(m,n \rightarrow \infty \),we have
Namely, \(\log [g_1(t;l_1,l_2,\ldots ,l_p)] = -\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2),\) let \(t=1\), we can get
as \(m,n \rightarrow \infty \). This completes proof of Theorem 1. \(\square \)
1.2 Proof of \(T_{m,n,p}= \sum \nolimits _{j=1}^{p}\left( {{\bar{X}}}_{j}-{{\bar{Y}}}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) \) under null hypothesis with \(\varvec{\mu }_1=\varvec{\mu }_2\)
Proof
We can obtain that,
Under null hypothesis with \(\varvec{\mu }_1=\varvec{\mu }_2\), we can obtain the above conclusion. \(\square \)
Rights and permissions
About this article
Cite this article
Li, Z., Liu, F., Zeng, L. et al. A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension. Comput Stat 36, 941–960 (2021). https://doi.org/10.1007/s00180-020-01030-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01030-x