A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension

Li, Zhengbang; Liu, Fuxiang; Zeng, Luanjie; Zuo, Guoxin

doi:10.1007/s00180-020-01030-x

A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension

Original paper
Published: 02 September 2020

Volume 36, pages 941–960, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Zhengbang Li¹,
Fuxiang Liu²,
Luanjie Zeng¹ &
…
Guoxin Zuo ORCID: orcid.org/0000-0001-5447-2263¹

209 Accesses
1 Citation
Explore all metrics

Abstract

Two sample mean vectors comparison hypothesis testing problems often emerge in modern biostatistics. Many tests are proposed for detecting relatively dense signals with somewhat dense nonzero components in mean vectors differences. One kind of these tests is based on some quadratic forms about two sample mean vectors differences. Another kind of these tests is based on some quadratic forms about studentized version of two sample mean vectors differences. In this article, we propose a bootstrap test by adopting stationary bootstrap scheme to calculate p value of a typical test which is based on a quadratic form about studentized version of two sample mean vectors differences. Extensive simulations are conducted to compare performances of the bootstrap test with other existing typical tests. We also apply the bootstrap test to a real genetic data analysis about breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Statistical power for cluster analysis

Article Open access 31 May 2022

Edwin S. Dalmaijer, Camilla L. Nord & Duncan E. Astle

References

Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329
MathSciNet MATH Google Scholar
Bentkus V (1986) Dependence of the Berry–Esseen estimate on the dimension. Litovsk Matematicheskii Sbornik 26:205–210 (in Russian)
MathSciNet MATH Google Scholar
Bentkus V (2003) On the dependence of the Berry–Esseen bound on dimension. J Stat Plan Inference 113:385–402
Article MathSciNet MATH Google Scholar
Bhattacharya R (1975) On the errors of normal approximation. Ann Probab 3:815–828
Article MathSciNet MATH Google Scholar
Bilodeau M, Brenner D (1999) Theory of multivariate statistics. Springer, New York
MATH Google Scholar
Brockwell P, Davis R (2009) Time series: theory and methods, Springer series in statistics. Springer, New York
MATH Google Scholar
Cai TT, Liu W, Xia Y (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc 108:265–277
Article MathSciNet MATH Google Scholar
Cai TT, Liu W, Xia Y (2014) Two-sample test of high dimensional means under dependence. J R Stat Soc (Ser B) 76:349–372
Article MathSciNet Google Scholar
Chang J, Zheng C, Zhou WX, Zhou W (2017) Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics 73(4):1300–1310
Article MathSciNet MATH Google Scholar
Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835
MathSciNet MATH Google Scholar
Chernozhukov V, Chetverikov D, Kato K (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of highdimensional random vectors. Ann Stat 41:2786–2819
Article MATH Google Scholar
Chernozhukov V, Chetverikov D, Kato K (2017) Central limit theorems and bootstrap in high dimensions. Ann Stat 45(4):2309–2352
MathSciNet MATH Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Article MathSciNet MATH Google Scholar
Gravier E, Pierron G, Vincent-Salomon A, Gruel N, Raynal V, Savignoni A, De Rycke Y, Pierga JY, Lucchesi C, Reyal F, Fourquet A, Roman-Roman S, Radvanyi F, Sastre-Garau X, Asselain B, Delattre O (2010) A prognostic DNA signature for T1T2 node-negative breast cancer patients. Genes Chromosom Cancer 49(12):1125–1134
Article Google Scholar
Gregory KB, Carroll RJ, Baladandayuthapani V, Lahiri SN (2015) A two-sample test for equality of means in high dimension. J Am Stat Assoc 110(510):837–849
Article MathSciNet MATH Google Scholar
Kunsch HR (1989) The jackknife and the bootstrap for general stationary observations. Ann Stat 17:1217–1241
Article MathSciNet MATH Google Scholar
Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: LePage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York
Google Scholar
Nagaev S (1976) An estimate of the remainder term in the multidimensional central limit theorem. In: Proceedings of the third Japan-USSR symposium on probability theory, vol 550, pp 419–438
Politis DN, Romano JP (1994) The stationary bootstrap. J Am Stat Assoc 89:1303–1313
Article MathSciNet MATH Google Scholar
Politis DN, Romano JP (1995) Bias-corrected nonparametric spectral estimation. J Time Ser Anal 16:67–104
Article MathSciNet MATH Google Scholar
Sazonov V (1968) On the multi-dimensional central limit theorem. Sankhya Ser A 30:181–204
MathSciNet MATH Google Scholar
Sazonov V (1981) Normal approximations: some recent advances. Lecture notes in mathematics, vol 879. Springer, Berlin
Book Google Scholar
Senatov V (1980) Several estimates of the rate of convergence in the multidimensional CLT. Doklady Akademii nauk Soiuza Sovetskikh Sotsialisticheskikh Respublik 254:809–812
MathSciNet Google Scholar
Srivastava MS, Katayarna S, Kano Y (2013) A two sample test in high di-mensional data. J Multivar Anal 114:349–358
Article Google Scholar
Sweeting T (1977) Speed of convergence for the multidimensional central limit theorem. Ann Probab 5:28–41
Article MathSciNet MATH Google Scholar
White H, Politis DN (2004) Automatic block-length selection for the dependent bootstrap. Econ Rev 23:53–70
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Research of Z. Li and L. Zeng is partially supported by the self-determined research funds of CCNU from the colleges basic research of MOE (CCNU20TS002). Research of F. Liu is partially supported by Humanity and Social Science foundation of MOE of China (17YJA630066) and China Natural Science Fund (11601267).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Central China Normal University, Wuhan, China
Zhengbang Li, Luanjie Zeng & Guoxin Zuo
College of Science and Three Gorges Mathematical Research Center, China Three Gorges University, Yichang, China
Fuxiang Liu

Authors

Zhengbang Li
View author publications
You can also search for this author in PubMed Google Scholar
Fuxiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Luanjie Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Guoxin Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengbang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Theorem 1

Proof

For any p-dimension real vector $\mathbf{l }=(l_1,l_2,\ldots ,l_p)'_{p\times 1}$ with $\vert \vert \mathbf{l }\vert \vert ^2_2=\sum \nolimits _{s=1}^p l_s^2<M_p$, Denote $f_X(t)$ character function of one-dimension random variable $ \mathbf{l }' \mathbf{X } =\sum \nolimits ^p_{q=1}l_qX_q$, and $f_Y(t)$ character function of one-dimension random variable $ \mathbf{l }' \mathbf{Y } =\sum \nolimits ^p_{q=1}l_qY_q$. For any positive integer p, there exists a positive real number $\alpha $, conditions $\frac{\mathbf{l }'\Sigma _1\mathbf{l }}{p^\alpha }<C$ and $ \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{p^\alpha }<C $ are satisfied, where C is a finite positive constant for any p. Then ,we can get

$$\begin{aligned} f_X(t) = 1 + \frac{\mathbf{l }' \mu _1}{p^{\alpha /2}}(it) - \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2), \end{aligned}$$

and

$$\begin{aligned} f_Y(t) = 1 + \frac{\mathbf{l }' \mu _2}{p^{\alpha /2}}(it) - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2). \end{aligned}$$

Characteristic function of $\left( \frac{(m+n)^{1/2}({{\bar{X}}}_1 -\bar{Y}_1)}{p^{\alpha /2}}+\cdots +\frac{(m+n)^{1/2}({{\bar{X}}}_p -\bar{Y}_p)}{p^{\alpha /2}}\right) $ is

$$\begin{aligned} g_1(t;l_1,l_2,\ldots ,l_p)= & {} \left[ f_X\left( \frac{(m+n)^{1/2}}{m}t\right) \right] ^m \left[ f_Y\left( -\frac{(m+n)^{1/2}}{n}t\right) \right] ^n\\= & {} \left[ 1 + \frac{\mathbf{l }' \mu _1}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{m}(it) - \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{m^2}t^2 + o(\frac{(m+n)}{m^2}t^2)\right] ^m\\&\left[ 1 - \frac{\mathbf{l }' \mu _2}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{n}(it) - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{n^2} t^2 + o\left( \frac{(m+n)}{n^2}t^2\right) \right] ^n . \end{aligned}$$

Under null hypothesis,as $m,n \rightarrow \infty $,we have

$$\begin{aligned}&\log [g_1(l_1,l_2,\ldots ,l_p)] \\&\quad = m \log \left[ 1 + \frac{\mathbf{l }' \mu _1}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{m}(it) - \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{m^2}t^2 + o\left( \frac{(m+n)}{m^2}t^2\right) \right] \\&\qquad + n \log \left[ 1 - \frac{\mathbf{l }' \mu _2}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{n}(it) - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }} \frac{(m+n)}{n^2}t^2 + o\left( \frac{(m+n)}{n^2}t^2\right) \right] \\&\quad =- \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{m}t^2 - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }} \frac{(m+n)}{n}t^2 + o\left( \frac{(m+n)}{m}t^2\right) + o\left( \frac{(m+n)}{n}t^2\right) \\&\quad =-\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2). \end{aligned}$$

Namely, $\log [g_1(t;l_1,l_2,\ldots ,l_p)] = -\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2),$ let $t=1$, we can get

$$\begin{aligned} \log [h_1(l_1,l_2,\ldots ,l_p)] = \log [g_1(1;l_1,l_2,\ldots ,l_p)] = -\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }} + {o(1)}, \end{aligned}$$

as $m,n \rightarrow \infty $. This completes proof of Theorem 1. $\square $

1.2 Proof of $T_{m,n,p}= \sum \nolimits _{j=1}^{p}\left( {{\bar{X}}}_{j}-{{\bar{Y}}}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) $ under null hypothesis with $\varvec{\mu }_1=\varvec{\mu }_2$

Proof

We can obtain that,

$$\begin{aligned}&T_{m,n,p}-\left\| \varvec{\mu }_{1}-\varvec{\mu }_{2}\right\| ^{2}\\&\quad = \frac{\sum \nolimits _{(i \ne j)=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{X}}^{(j)}}{m\left( m-1\right) } +\frac{\sum \nolimits _{(i \ne j)=1}^{n} ({\varvec{Y}}^{(i)})^{T} {\varvec{Y}}^{(j)}}{n\left( n-1\right) }-2 \frac{\sum \nolimits _{i=1}^{n} \sum \nolimits _{j=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{Y}}^{(j)}}{n m}\\&\quad = \frac{\sum \nolimits _{i ,j=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{X}}^{(j)}-\sum \nolimits _{i =1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{X}}^{(i)}}{m\left( m-1\right) } + \frac{\sum \nolimits _{i ,j=1}^{n} ({\varvec{Y}}^{(i)})^{T} {\varvec{Y}}^{(j)}-\sum \nolimits _{i =1}^{n} ({\varvec{Y}}^{(i)})^{T} {\varvec{Y}}^{(i)}}{n\left( n-1\right) }\\&\qquad -2 \frac{\sum \nolimits _{i=1}^{n} \sum \nolimits _{j=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{Y}}^{(j)}}{n m} \\&\quad = \frac{m^2 {\bar{{\varvec{X}}}} ^T {\bar{{\varvec{X}}}} -(m-1)tr {\widehat{\Sigma }}_1-m {\bar{{\varvec{X}}}}^T {\bar{{\varvec{X}}}}}{m\left( m-1\right) } + \frac{m^2 {\bar{{\varvec{Y}}}} ^T {\bar{{\varvec{Y}}}} -(m-1)tr {\widehat{\Sigma }}_1-m {\bar{{\varvec{Y}}}}^T {\bar{{\varvec{Y}}}}}{n\left( n-1\right) } \\&\qquad -2 \frac{mn {\bar{{\varvec{X}}}}^{{\varvec{T}}}{\bar{{\varvec{Y}}}}}{n m}\\&\quad = ({\bar{{\varvec{X}}}}-{\bar{{\varvec{Y}}}})^T({\bar{{\varvec{X}}}}-{\bar{{\varvec{Y}}}}) -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) \\&\quad = \sum \limits _{j=1}^{p}\left( {{\bar{X}}}_{j}-\bar{Y}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) . \end{aligned}$$

Under null hypothesis with $\varvec{\mu }_1=\varvec{\mu }_2$, we can obtain the above conclusion. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Liu, F., Zeng, L. et al. A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension. Comput Stat 36, 941–960 (2021). https://doi.org/10.1007/s00180-020-01030-x

Download citation

Received: 10 August 2019
Accepted: 27 August 2020
Published: 02 September 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00180-020-01030-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Statistical power for cluster analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of \(T_{m,n,p}= \sum \nolimits _{j=1}^{p}\left( {{\bar{X}}}_{j}-{{\bar{Y}}}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) \) under null hypothesis with \(\varvec{\mu }_1=\varvec{\mu }_2\)

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Statistical power for cluster analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of \(T_{m,n,p}= \sum \nolimits _{j=1}^{p}\left( {{\bar{X}}}_{j}-{{\bar{Y}}}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) \) under null hypothesis with \(\varvec{\mu }_1=\varvec{\mu }_2\)

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation