Skip to main content

Advertisement

Log in

A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Two sample mean vectors comparison hypothesis testing problems often emerge in modern biostatistics. Many tests are proposed for detecting relatively dense signals with somewhat dense nonzero components in mean vectors differences. One kind of these tests is based on some quadratic forms about two sample mean vectors differences. Another kind of these tests is based on some quadratic forms about studentized version of two sample mean vectors differences. In this article, we propose a bootstrap test by adopting stationary bootstrap scheme to calculate p value of a typical test which is based on a quadratic form about studentized version of two sample mean vectors differences. Extensive simulations are conducted to compare performances of the bootstrap test with other existing typical tests. We also apply the bootstrap test to a real genetic data analysis about breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329

    MathSciNet  MATH  Google Scholar 

  • Bentkus V (1986) Dependence of the Berry–Esseen estimate on the dimension. Litovsk Matematicheskii Sbornik 26:205–210 (in Russian)

    MathSciNet  MATH  Google Scholar 

  • Bentkus V (2003) On the dependence of the Berry–Esseen bound on dimension. J Stat Plan Inference 113:385–402

    Article  MathSciNet  MATH  Google Scholar 

  • Bhattacharya R (1975) On the errors of normal approximation. Ann Probab 3:815–828

    Article  MathSciNet  MATH  Google Scholar 

  • Bilodeau M, Brenner D (1999) Theory of multivariate statistics. Springer, New York

    MATH  Google Scholar 

  • Brockwell P, Davis R (2009) Time series: theory and methods, Springer series in statistics. Springer, New York

    MATH  Google Scholar 

  • Cai TT, Liu W, Xia Y (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc 108:265–277

    Article  MathSciNet  MATH  Google Scholar 

  • Cai TT, Liu W, Xia Y (2014) Two-sample test of high dimensional means under dependence. J R Stat Soc (Ser B) 76:349–372

    Article  MathSciNet  Google Scholar 

  • Chang J, Zheng C, Zhou WX, Zhou W (2017) Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics 73(4):1300–1310

    Article  MathSciNet  MATH  Google Scholar 

  • Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835

    MathSciNet  MATH  Google Scholar 

  • Chernozhukov V, Chetverikov D, Kato K (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of highdimensional random vectors. Ann Stat 41:2786–2819

    Article  MATH  Google Scholar 

  • Chernozhukov V, Chetverikov D, Kato K (2017) Central limit theorems and bootstrap in high dimensions. Ann Stat 45(4):2309–2352

    MathSciNet  MATH  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Gravier E, Pierron G, Vincent-Salomon A, Gruel N, Raynal V, Savignoni A, De Rycke Y, Pierga JY, Lucchesi C, Reyal F, Fourquet A, Roman-Roman S, Radvanyi F, Sastre-Garau X, Asselain B, Delattre O (2010) A prognostic DNA signature for T1T2 node-negative breast cancer patients. Genes Chromosom Cancer 49(12):1125–1134

    Article  Google Scholar 

  • Gregory KB, Carroll RJ, Baladandayuthapani V, Lahiri SN (2015) A two-sample test for equality of means in high dimension. J Am Stat Assoc 110(510):837–849

    Article  MathSciNet  MATH  Google Scholar 

  • Kunsch HR (1989) The jackknife and the bootstrap for general stationary observations. Ann Stat 17:1217–1241

    Article  MathSciNet  MATH  Google Scholar 

  • Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: LePage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York

    Google Scholar 

  • Nagaev S (1976) An estimate of the remainder term in the multidimensional central limit theorem. In: Proceedings of the third Japan-USSR symposium on probability theory, vol 550, pp 419–438

  • Politis DN, Romano JP (1994) The stationary bootstrap. J Am Stat Assoc 89:1303–1313

    Article  MathSciNet  MATH  Google Scholar 

  • Politis DN, Romano JP (1995) Bias-corrected nonparametric spectral estimation. J Time Ser Anal 16:67–104

    Article  MathSciNet  MATH  Google Scholar 

  • Sazonov V (1968) On the multi-dimensional central limit theorem. Sankhya Ser A 30:181–204

    MathSciNet  MATH  Google Scholar 

  • Sazonov V (1981) Normal approximations: some recent advances. Lecture notes in mathematics, vol 879. Springer, Berlin

    Book  Google Scholar 

  • Senatov V (1980) Several estimates of the rate of convergence in the multidimensional CLT. Doklady Akademii nauk Soiuza Sovetskikh Sotsialisticheskikh Respublik 254:809–812

    MathSciNet  Google Scholar 

  • Srivastava MS, Katayarna S, Kano Y (2013) A two sample test in high di-mensional data. J Multivar Anal 114:349–358

    Article  Google Scholar 

  • Sweeting T (1977) Speed of convergence for the multidimensional central limit theorem. Ann Probab 5:28–41

    Article  MathSciNet  MATH  Google Scholar 

  • White H, Politis DN (2004) Automatic block-length selection for the dependent bootstrap. Econ Rev 23:53–70

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Research of Z. Li and L. Zeng is partially supported by the self-determined research funds of CCNU from the colleges basic research of MOE (CCNU20TS002). Research of F. Liu is partially supported by Humanity and Social Science foundation of MOE of China (17YJA630066) and China Natural Science Fund (11601267).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengbang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of Theorem 1

Proof

For any p-dimension real vector \(\mathbf{l }=(l_1,l_2,\ldots ,l_p)'_{p\times 1}\) with \(\vert \vert \mathbf{l }\vert \vert ^2_2=\sum \nolimits _{s=1}^p l_s^2<M_p\), Denote \(f_X(t)\) character function of one-dimension random variable \( \mathbf{l }' \mathbf{X } =\sum \nolimits ^p_{q=1}l_qX_q\), and \(f_Y(t)\) character function of one-dimension random variable \( \mathbf{l }' \mathbf{Y } =\sum \nolimits ^p_{q=1}l_qY_q\). For any positive integer p, there exists a positive real number \(\alpha \), conditions \(\frac{\mathbf{l }'\Sigma _1\mathbf{l }}{p^\alpha }<C\) and \( \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{p^\alpha }<C \) are satisfied, where C is a finite positive constant for any p. Then ,we can get

$$\begin{aligned} f_X(t) = 1 + \frac{\mathbf{l }' \mu _1}{p^{\alpha /2}}(it) - \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2), \end{aligned}$$

and

$$\begin{aligned} f_Y(t) = 1 + \frac{\mathbf{l }' \mu _2}{p^{\alpha /2}}(it) - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2). \end{aligned}$$

Characteristic function of \(\left( \frac{(m+n)^{1/2}({{\bar{X}}}_1 -\bar{Y}_1)}{p^{\alpha /2}}+\cdots +\frac{(m+n)^{1/2}({{\bar{X}}}_p -\bar{Y}_p)}{p^{\alpha /2}}\right) \) is

$$\begin{aligned} g_1(t;l_1,l_2,\ldots ,l_p)= & {} \left[ f_X\left( \frac{(m+n)^{1/2}}{m}t\right) \right] ^m \left[ f_Y\left( -\frac{(m+n)^{1/2}}{n}t\right) \right] ^n\\= & {} \left[ 1 + \frac{\mathbf{l }' \mu _1}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{m}(it) - \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{m^2}t^2 + o(\frac{(m+n)}{m^2}t^2)\right] ^m\\&\left[ 1 - \frac{\mathbf{l }' \mu _2}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{n}(it) - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{n^2} t^2 + o\left( \frac{(m+n)}{n^2}t^2\right) \right] ^n . \end{aligned}$$

Under null hypothesis,as \(m,n \rightarrow \infty \),we have

$$\begin{aligned}&\log [g_1(l_1,l_2,\ldots ,l_p)] \\&\quad = m \log \left[ 1 + \frac{\mathbf{l }' \mu _1}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{m}(it) - \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{m^2}t^2 + o\left( \frac{(m+n)}{m^2}t^2\right) \right] \\&\qquad + n \log \left[ 1 - \frac{\mathbf{l }' \mu _2}{p^{\alpha /2}}\frac{(m+n)^{1/2}}{n}(it) - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }} \frac{(m+n)}{n^2}t^2 + o\left( \frac{(m+n)}{n^2}t^2\right) \right] \\&\quad =- \frac{\mathbf{l }'\Sigma _1\mathbf{l }}{2p^{\alpha }}\frac{(m+n)}{m}t^2 - \frac{\mathbf{l }'\Sigma _2\mathbf{l }}{2p^{\alpha }} \frac{(m+n)}{n}t^2 + o\left( \frac{(m+n)}{m}t^2\right) + o\left( \frac{(m+n)}{n}t^2\right) \\&\quad =-\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2). \end{aligned}$$

Namely, \(\log [g_1(t;l_1,l_2,\ldots ,l_p)] = -\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }}t^2 + o(t^2),\) let \(t=1\), we can get

$$\begin{aligned} \log [h_1(l_1,l_2,\ldots ,l_p)] = \log [g_1(1;l_1,l_2,\ldots ,l_p)] = -\frac{\mathbf{l }'(\frac{\Sigma _1}{\theta }+\frac{\Sigma _2}{1-\theta })\mathbf{l }}{2p^{\alpha }} + {o(1)}, \end{aligned}$$

as \(m,n \rightarrow \infty \). This completes proof of Theorem 1. \(\square \)

1.2 Proof of \(T_{m,n,p}= \sum \nolimits _{j=1}^{p}\left( {{\bar{X}}}_{j}-{{\bar{Y}}}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) \) under null hypothesis with \(\varvec{\mu }_1=\varvec{\mu }_2\)

Proof

We can obtain that,

$$\begin{aligned}&T_{m,n,p}-\left\| \varvec{\mu }_{1}-\varvec{\mu }_{2}\right\| ^{2}\\&\quad = \frac{\sum \nolimits _{(i \ne j)=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{X}}^{(j)}}{m\left( m-1\right) } +\frac{\sum \nolimits _{(i \ne j)=1}^{n} ({\varvec{Y}}^{(i)})^{T} {\varvec{Y}}^{(j)}}{n\left( n-1\right) }-2 \frac{\sum \nolimits _{i=1}^{n} \sum \nolimits _{j=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{Y}}^{(j)}}{n m}\\&\quad = \frac{\sum \nolimits _{i ,j=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{X}}^{(j)}-\sum \nolimits _{i =1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{X}}^{(i)}}{m\left( m-1\right) } + \frac{\sum \nolimits _{i ,j=1}^{n} ({\varvec{Y}}^{(i)})^{T} {\varvec{Y}}^{(j)}-\sum \nolimits _{i =1}^{n} ({\varvec{Y}}^{(i)})^{T} {\varvec{Y}}^{(i)}}{n\left( n-1\right) }\\&\qquad -2 \frac{\sum \nolimits _{i=1}^{n} \sum \nolimits _{j=1}^{m} ({\varvec{X}}^{(i)})^{T} {\varvec{Y}}^{(j)}}{n m} \\&\quad = \frac{m^2 {\bar{{\varvec{X}}}} ^T {\bar{{\varvec{X}}}} -(m-1)tr {\widehat{\Sigma }}_1-m {\bar{{\varvec{X}}}}^T {\bar{{\varvec{X}}}}}{m\left( m-1\right) } + \frac{m^2 {\bar{{\varvec{Y}}}} ^T {\bar{{\varvec{Y}}}} -(m-1)tr {\widehat{\Sigma }}_1-m {\bar{{\varvec{Y}}}}^T {\bar{{\varvec{Y}}}}}{n\left( n-1\right) } \\&\qquad -2 \frac{mn {\bar{{\varvec{X}}}}^{{\varvec{T}}}{\bar{{\varvec{Y}}}}}{n m}\\&\quad = ({\bar{{\varvec{X}}}}-{\bar{{\varvec{Y}}}})^T({\bar{{\varvec{X}}}}-{\bar{{\varvec{Y}}}}) -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) \\&\quad = \sum \limits _{j=1}^{p}\left( {{\bar{X}}}_{j}-\bar{Y}_{j}\right) ^{2} -\left( \frac{1}{m}{\text {tr}} \widehat{\varvec{\Sigma }}_1+\frac{1}{n}{\text {tr}} \widehat{\varvec{\Sigma }}_2\right) . \end{aligned}$$

Under null hypothesis with \(\varvec{\mu }_1=\varvec{\mu }_2\), we can obtain the above conclusion. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Liu, F., Zeng, L. et al. A stationary bootstrap test about two mean vectors comparison with somewhat dense differences and fewer sample size than dimension. Comput Stat 36, 941–960 (2021). https://doi.org/10.1007/s00180-020-01030-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01030-x

Keywords

Navigation