Abstract
In this paper, the problem of high-dimensional multivariate analysis of variance is investigated under a low-dimensional factor structure which violates some vital assumptions on covariance matrix in some existing literature. We propose a new test and derive that the asymptotic distribution of the test statistic is a weighted distribution of chi-squares of 1 degree of freedom under the null hypothesis and mild conditions. We provide numerical studies on both sizes and powers to illustrate performance of the proposed test.
Similar content being viewed by others
References
Ahn, S.C., Horenstein, A.R.: Eigenvalue ratio test for the number of factors. Econometrica 81, 1203–1227 (2013)
Bai, Z., Saranadasa, H.: Effect of high dimension: by an example of a two sample problem. Stat. Sin. 6, 311–329 (1996)
Cai, T., Xia, Y.: High-dimensional sparse MANOVA. J. Multivar. Anal. 131, 174–196 (2014)
Cao, M., Park, J., He, D.: A test for \(k\) sample Behrens–Fisher problem in high dimensional data. J. Stat. Plan. Inference 201, 86–102 (2019)
Chen, S., Qin, Y.: A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 38, 808–835 (2010)
Chen, S., Zhang, L., Zhong, P.: Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc. 105, 810–819 (2010)
Fujikoshi, Y., Himeno, T., Wakaki, H.: Asymptotic results of a high dimensional MANOVA test and power comparisons when the dimension is large compared to the sample size. J. Jpn .Stat. Soc. 34, 19–26 (2004)
Hu, J., Bai, Z., Wang, C., Wang, W.: On testing the equality of high dimensional mean vectors with unequal covariance matrices. Ann. Inst. Stat. Math. 69, 365–387 (2017)
Ma, Y., Lan, W., Wang, H.: A high dimensional two-sample test under a low dimensional factor structure. J. Multivar. Anal. 140, 162–170 (2015)
Muirhead, R.J.: Aspects of Multivariate Statistical Theory. Wiley, New York (1982)
Schott, J.: Some high-dimensional tests for a one-way MANOVA. J. Multivar. Anal. 98, 1825–1839 (2007)
Srivastava, M.: Multivariate theory for analyzing high dimensional data. J. Jpn. Stat. Soc. 37, 53–86 (2007)
Wang, R., Xu, X.: Least favorable direction test for multivariate analysis of variance in high dimension. Stat. Sin. (2019). http://www3.stat.sinica.edu.tw/ss_newpaper/SS-2018-0279_na.pdf
Wang, H.: Factor profiled sure independence screening. Biometrika 99, 15–28 (2012)
Wang, R., Xu, X.: On two sample mean tests under spiked covariances. J. Multivar. Anal. 167, 225–249 (2018)
Yamada, T., Srivastava, M.: A test for multivariate analysis of variance in high dimension. Commun. Stat. Simul. Comput. 41, 2602–2615 (2012)
Zhang, J.: Approximate and asymptotic distributions of chi-squared-type mixtures with applications. J. Am. Stat. Assoc. 100, 273–285 (2005)
Zhang, J., Guo, J., Zhou, B.: Linear hypothesis testing in high-dimensional one-way MANOVA. J. Multivar. Anal. 155, 200–216 (2017)
Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 49, 3236–3248 (2007)
Acknowledgements
The authors thank the Editor-in-Chief, Professor Niansheng Tang and two anonymous referees for their constructive comments, suggestions and detailed advice that vastly improved this article. Cao’s research is supported by the National Statistical Science Research Program (No. 2020LY002)the National Natural Science Foundation of China (Nos. 11601008, 11526070) and Doctor Startup Foundation of Anhui Normal University (No. 2016XJJ101). He’s research is supported by Anhui Provincial Natural Science Foundation (No. 2008085MA08). Huang’s research is supported by Anhui Provincial Natural Science Foundation (No. 1908085MA20).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Some Lemmas
The following lemma is obtained from Theorem 1 in [9].
Lemma A.1
Under the assumptions (A1) and (A2), it holds that
-
(1)
\(\lambda _{l}(\varvec{\Sigma })/p=\lambda _{l}(\varvec{\Sigma }_{{\mathbf{A}}})+o(1)\) for \(1\le l\le q\);
-
(2)
\(\lambda _{l}(\varvec{\Sigma })\le c\) for \(l>q\) where c is a constant.
Lemma A.2
Let m-dimensionally random vector \({\mathbf{Y}}=(y_{1},\ldots ,y_{m})^{^{{\mathrm{T}}}}\) satisfy that \(\{y_{i}\}_{i=1}^{m}\) are mutually independent, and \(\text{ E }({\mathbf{Y}})={\mathbf{0}}\), \(\text{ Cov }({\mathbf{Y}})={\mathbf{I}}_{m}\) and \(\text{ E }(y_{i}^{4})=3+\gamma \), where \(\gamma \ge -2\) is a known constant. Then for any \(m\times m\) symmetric matrix \({\mathbf{G}}\), there has \(\text{ Var }({\mathbf{Y}}^{^{{\mathrm{T}}}}{\mathbf{G}}{\mathbf{Y}})=2\text{ tr }({\mathbf{G}}^{2})+\gamma \text{ tr }({\mathbf{G}}\circ {\mathbf{G}})\), where \(\circ \) denotes the Hadamard product of matrices.
Proof
The proof is straight. We can also see Proposition A.1 in [6]. \(\square \)
Lemma A.3 can be obtained by Lemma A.11 of [1], where the proof is omitted.
Lemma A.3
Under the factor model (1.1), as \(j\in \{1,\ldots ,q\}\) and \(p, n\rightarrow \infty \),
Appendix B: Proofs of Theorems
Proof of Theorem 2.1
First, under the null hypothesis \(H_{0}\),
where \(\overline{{\mathbf{Z}}}_{g}=\frac{1}{n_{g}}\sum _{i=1}^{n_{g}}{\mathbf{Z}}_{gi}\) and \(\overline{{\mathbf{e}}}_{g}=\frac{1}{n_{g}}\sum _{i=1}^{n_{g}}{\mathbf{e}}_{gi}\). Then we have
We firstly consider the term \(\sum _{i<j}^{k}p^{-1}n_{ij}I_{ij}\). It straightly shows that \(\sqrt{n_{g}}\overline{{\mathbf{Z}}}_{g}\) \({\mathop {\longrightarrow }\limits ^{d}}N(0,{\mathbf{I}}_{q})\) as \(n\rightarrow \infty \) by the central limit theorem for \(g\in \{1,\ldots ,k\}\). Denote \(\overline{{\mathbf{Z}}}=(\sqrt{n_{1}}\overline{{\mathbf{Z}}}_{1}^{^{{\mathrm{T}}}},\ldots ,\sqrt{n_{k}}\overline{{\mathbf{Z}}}_{k}^{^{{\mathrm{T}}}})^{^{{\mathrm{T}}}}\), then we have \(\overline{{\mathbf{Z}}}{\mathop {\longrightarrow }\limits ^{d}}N({\mathbf{0}},{\mathbf{I}}_{kq})\) and \(\overline{{\mathbf{Z}}}_{i}-\overline{{\mathbf{Z}}}_{j}=({\mathbf{c}}_{ij}^{^{{\mathrm{T}}}}\otimes {\mathbf{I}}_{q})\overline{{\mathbf{Z}}}\). Therefore,
where the last equality holds via the assumption (A2).
Let the spectral decompositions of matrices \(\varvec{\Sigma }_{{\mathbf{A}}}\) and \({\mathbf{C}}\) be \({\mathbf{Q}}\varvec{\Lambda }{\mathbf{Q}}^{^{{\mathrm{T}}}}\) and \({\mathbf{Q}}_{1}\varvec{\Delta }{\mathbf{Q}}_{1}^{^{{\mathrm{T}}}}\), respectively, then
It is noted that the rank of \({\mathbf{C}}\) is \(k-1\), which combining with (B.2) shows that
Next we deal with the term \(\sum _{i<j}^{k}p^{-1}n_{ij}V_{ij}\). It is easy to show that \(\text{ E }(V_{ij})=0\) and \(\text{ Var }(V_{ij})=n_{ij}^{-2}\text{ tr }({\mathbf{A}}{\mathbf{A}}^{^{{\mathrm{T}}}}{\mathbf{H}})\). Thus, by assumptions (A1) and (A2), we have
which shows that
For the three term \(\sum _{i<j}^{k}p^{-1}n_{ij}U_{ij}\), we have
It follows from Lemma A.2 and \(U_{ij}=\left( n_{ij}^{\frac{1}{2}}{\mathbf{H}}^{-\frac{1}{2}}(\overline{{\mathbf{e}}}_{i}-\overline{{\mathbf{e}}}_{j})\right) ^{^{{\mathrm{T}}}}(n_{ij}^{-1}{\mathbf{H}})\left( n_{ij}^{\frac{1}{2}}{\mathbf{H}}^{-\frac{1}{2}}(\overline{{\mathbf{e}}}_{i}-\overline{{\mathbf{e}}}_{j})\right) \) that \(\text{ Var }(U_{ij})=o(p^{2}n_{ij}^{-2})\). Hence, \(\text{ Var }\left( \sum \nolimits _{i<j}^{k}p^{-1}n_{ij}U_{ij}\right) =o(1)\) which results in
Lastly, this proof is completed via (B.1) and (B.3)–(B.5). \(\square \)
Proof of Theorem 2.3
In order to prove this theorem, we only need to show \(p^{-1}\text{ tr }({\mathbf{S}})-p^{-1}\text{ tr }(\varvec{\Sigma }){\mathop {\longrightarrow }\limits ^{P}}0\) and \(p^{-1}\sum _{l=1}^{q}\lambda _{l}({\mathbf{S}})-\text{ tr }(\varvec{\Sigma }_{{\mathbf{A}}}){\mathop {\longrightarrow }\limits ^{P}}0\). Note that
Firstly, it follows from \(\frac{1}{n_{g}}\sum _{i=1}^{n_{g}}({\mathbf{Z}}_{gi}-\overline{{\mathbf{Z}}}_{g})({\mathbf{Z}}_{gi}-\overline{{\mathbf{Z}}}_{g})^{^{{\mathrm{T}}}}{\mathop {\longrightarrow }\limits ^{P}}{\mathbf{I}}_{q}\) that \({\mathbf{R}}_{1}={\mathbf{A}}{\mathbf{A}}^{^{{\mathrm{T}}}}\{1+o_{p}(1)\}\) as \(n\rightarrow \infty \). Consequently,
For the term \({\mathbf{R}}_{2}\), it is easy to get \(\text{ E }({\mathbf{R}}_{2})={\mathbf{H}}\) and
where \(c_{1}\) is a known and finite constant. Thus we have
which suggests that as \(n,p\rightarrow \infty \),
Next for the term \({\mathbf{R}}_{3}\). It straightly shows that \(\text{ E }({\mathbf{R}}_{3})=0\) and
Thus,
which results in
Lastly, \(p^{-1}\text{ tr }({\mathbf{S}})-p^{-1}\text{ tr }(\varvec{\Sigma }){\mathop {\longrightarrow }\limits ^{P}}0\) holds from (B.6)–(B.9).
For the statement \(p^{-1}\sum _{l=1}^{q}\lambda _{l}({\mathbf{S}})-\text{ tr }(\varvec{\Sigma }_{{\mathbf{A}}}){\mathop {\longrightarrow }\limits ^{P}}0\), which evidently holds according to Lemmas A.1 and A.3. Therefore, the proof of Theorem 2.3 is completed. \(\square \)
Proof of Theorem 2.5
It is easy to get \(\hat{\xi }_{\alpha }{\mathop {\longrightarrow }\limits ^{P}}\xi _{\alpha }\). Thus the theorem follows if we can verify that \(\widehat{T}{\mathop {\longrightarrow }\limits ^{P}}\infty \). Denote \({\mathbf{Y}}_{gi}:={\mathbf{X}}_{gi}-\varvec{\mu }_{g}\), then
where \(\omega =p^{-1}\sum _{i<j}^{k}n_{ij}\Vert \overline{{\mathbf{Y}}}_{i}-\overline{{\mathbf{Y}}}_{j}\Vert ^{2} -\frac{k(k-1)}{2p}\left\{ \text{ tr }({\mathbf{S}})-\sum _{l=1}^{\hat{q}}\lambda _{l}({\mathbf{S}})\right\} \), \(Q_{ij}=(\varvec{\mu }_{i}-\varvec{\mu }_{j})^{^{{\mathrm{T}}}}(\overline{{\mathbf{Y}}}_{i}-\overline{{\mathbf{Y}}}_{j})\) and \(\overline{{\mathbf{Y}}}_{g}=n_{g}^{-1}\sum _{i=1}^{n_{g}}{\mathbf{Y}}_{gi}\).
It follows from Theorems 2.1 and 2.3 that
For the term \(Q_{ij}\), it is easy to get \(\text{ E }(Q_{ij})=0\) and \(\text{ Var }(Q_{ij})=n_{ij}^{-1}(\varvec{\mu }_{i}-\varvec{\mu }_{j})^{^{{\mathrm{T}}}}\varvec{\Sigma }(\varvec{\mu }_{i}-\varvec{\mu }_{j}){\le }n_{ij}^{-1}\lambda _{1}(\varvec{\Sigma })\Vert \varvec{\mu }_{i}-\varvec{\mu }_{j}\Vert ^{2}\). Thus, \(\text{ Var }(p^{-1}n_{ij}Q_{ij})\le {p}^{-1}\lambda _{1}(\varvec{\Sigma })\delta _{ij}=O(\delta _{ij})\), which shows that \(p^{-1}n_{ij}Q_{ij}=O_{P}(\delta _{ij}^{1/2})\). Then we have
Lastly, the conclusion holds immediately from (B.10)–(B.12) and \(\sqrt{\delta }\le \sum _{i<j}^{k}\delta _{ij}^{1/2}\). \(\square \)
Rights and permissions
About this article
Cite this article
Cao, M., Zhao, Y., Xu, K. et al. A High-Dimensional Test for Multivariate Analysis of Variance Under a Low-Dimensional Factor Structure. Commun. Math. Stat. 10, 581–597 (2022). https://doi.org/10.1007/s40304-020-00236-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40304-020-00236-1