当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications to Whole Genome Association Studies
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2020-11-12 , DOI: 10.1080/01621459.2020.1822849
Zilin Li 1 , Yaowu Liu 2 , Xihong Lin 1
Affiliation  

Abstract

We consider in this article detection of signal regions associated with disease outcomes in whole genome association studies. Gene- or region-based methods have become increasingly popular in whole genome association analysis as a complementary approach to traditional individual variant analysis. However, these methods test for the association between an outcome and the genetic variants in a prespecified region, for example, a gene. In view of massive intergenic regions in whole genome sequencing (WGS) studies, we propose a computationally efficient quadratic scan (Q-SCAN) statistic based method to detect the existence and the locations of signal regions by scanning the genome continuously. The proposed method accounts for the correlation (linkage disequilibrium) among genetic variants, and allows for signal regions to have both causal and neutral variants, and the effects of signal variants to be in different directions. We study the asymptotic properties of the proposed Q-SCAN statistics. We derive an empirical threshold that controls for the family-wise error rate, and show that under regularity conditions the proposed method consistently selects the true signal regions. We perform simulation studies to evaluate the finite sample performance of the proposed method. Our simulation results show that the proposed procedure outperforms the existing methods, especially when signal regions have causal variants whose effects are in different directions, or are contaminated with neutral variants. We illustrate Q-SCAN by analyzing the WGS data from the Atherosclerosis Risk in Communities study. Supplementary materials for this article are available online.



中文翻译:

使用二次扫描统计同时检测信号区域及其在全基因组关联研究中的应用

摘要

我们在本文中考虑在全基因组关联研究中检测与疾病结果相关的信号区域。作为传统个体变异分析的补充方法,基于基因或区域的方法在全基因组关联分析中越来越受欢迎。然而,这些方法测试结果与预先指定区域(例如基因)中的遗传变异之间的关联。鉴于全基因组测序 (WGS) 研究中的大量基因间区域,我们提出了一种基于计算高效二次扫描 (Q-SCAN) 统计的方法,通过连续扫描基因组来检测信号区域的存在和位置。所提出的方法解释了遗传变异之间的相关性(连锁不平衡),并允许信号区域同时具有因果和中性变体,并且信号变体的影响在不同的方向上。我们研究了所提出的 Q-SCAN 统计量的渐近特性。我们推导出一个控制家庭错误率的经验阈值,并表明在规律性条件下,所提出的方法始终选择真实的信号区域。我们进行模拟研究以评估所提出方法的有限样本性能。我们的模拟结果表明,所提出的程序优于现有方法,尤其是当信号区域具有影响方向不同的因果变异或被中性变异污染时。我们通过分析来自社区动脉粥样硬化风险研究的 WGS 数据来说明 Q-SCAN。

更新日期:2020-11-12
down
wechat
bug