当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate and Efficient P-value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2018-06-28 , DOI: 10.1080/01621459.2017.1407776
Yaowu Liu 1 , Jun Xie 2
Affiliation  

ABSTRACT It is of fundamental interest in statistics to test the significance of a set of covariates. For example, in genome-wide association studies, a joint null hypothesis of no genetic effect is tested for a set of multiple genetic variants. The minimum p-value method, higher criticism, and Berk–Jones tests are particularly effective when the covariates with nonzero effects are sparse. However, the correlations among covariates and the nonGaussian distribution of the response pose a great challenge toward the p-value calculation of the three tests. In practice, permutation is commonly used to obtain accurate p-values, but it is computationally very intensive, especially when we need to conduct a large amount of hypothesis testing. In this paper, we propose a Gaussian approximation method based on a Monte Carlo scheme, which is computationally more efficient than permutation while still achieving similar accuracy. We derive nonasymptotic approximation error bounds that could vanish in the limit even if the number of covariates is much larger than the sample size. Through real-genotype-based simulations and data analysis of a genome-wide association study of Crohn’s disease, we compare the accuracy and computation cost of our proposed method, of permutation, and of the method based on asymptotic distribution. Supplementary materials for this article are available online.

中文翻译:

通过高斯近似准确有效地计算 P 值:一种新颖的蒙特卡罗方法

摘要 在统计学中,检验一组协变量的显着性具有重要意义。例如,在全基因组关联研究中,针对一组多个遗传变异测试了无遗传效应的联合无效假设。当具有非零效应的协变量稀疏时,最小 p 值方法、更高的批评和 Berk-Jones 检验特别有效。然而,协变量之间的相关性和响应的非高斯分布对三个检验的 p 值计算提出了很大的挑战。在实践中,置换通常用于获得准确的 p 值,但它的计算量非常大,尤其是当我们需要进行大量假设检验时。在本文中,我们提出了一种基于蒙特卡罗方案的高斯近似方法,这在计算上比排列更有效,同时仍能达到相似的精度。我们推导出非渐近近似误差界限,即使协变量的数量远大于样本大小,该界限也可能在极限内消失。通过对克罗恩病的全基因组关联研究的基于真实基因型的模拟和数据分析,我们比较了我们提出的方法、排列和基于渐近分布的方法的准确性和计算成本。本文的补充材料可在线获取。通过对克罗恩病的全基因组关联研究的基于真实基因型的模拟和数据分析,我们比较了我们提出的方法、排列和基于渐近分布的方法的准确性和计算成本。本文的补充材料可在线获取。通过对克罗恩病的全基因组关联研究的基于真实基因型的模拟和数据分析,我们比较了我们提出的方法、排列和基于渐近分布的方法的准确性和计算成本。本文的补充材料可在线获取。
更新日期:2018-06-28
down
wechat
bug