当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A more powerful test of equality of high-dimensional two-sample means
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2021-07-12 , DOI: 10.1016/j.csda.2021.107318
Huaiyu Zhang 1 , Haiyan Wang 2
Affiliation  

A new test is proposed for testing the equality of two sample means in high dimensional data in which the sample sizes may be much less than the dimension. The test is constructed based on a studentized average of squared component-wise t-statistics. Asymptotic normality of the test statistic was derived under H0. Theoretical properties of the power function were given under local alternatives. The new test has much better type I error control and power compared to a similarly constructed competing test in recent literature as a result of a more efficient scaling parameter estimate in the test statistic. Monte Carlo experiments show that the new test outperforms several popular competing tests under various data settings, especially when components of the data vector have high correlations. The results are established under the condition that there exists a permutation of the component indices such that the correlation decays suitably fast (at least with polynomial rate). The test is further evaluated with a real-data task of identifying differently expressed Gene Ontology terms with the acute lymphoblastic leukemia gene expression data. The new test provides more consistent results on random samples of the dataset.



中文翻译:

更强大的高维二维样本均值等式检验

提出了一种新的测试,用于测试高维数据中两个样本均值的相等性,其中样本大小可能远小于维数。该测试是基于学生化的均方分量 t 统计量构建的。检验统计量的渐近正态性是在H0. 幂函数的理论特性是在局部替代方案下给出的。与最近文献中类似构造的竞争测试相比,新测试具有更好的 I 类错误控制和功效,因为测试统计中的缩放参数估计更有效。Monte Carlo 实验表明,新测试在各种数据设置下优于几种流行的竞争测试,尤其是当数据向量的组件具有高相关性时。结果是在分量指数存在排列的条件下建立的,这样相关性衰减得非常快(至少以多项式速率衰减)。使用真实数据任务进一步评估该测试,该任务使用急性淋巴细胞白血病基因表达数据识别不同表达的基因本体术语。

更新日期:2021-07-19
down
wechat
bug