当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An approach to gene-based testing accounting for dependence of tests among nearby genes
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-07-29 , DOI: 10.1093/bib/bbab329
Ronald Yurko 1 , Kathryn Roeder 2 , Bernie Devlin 3 , Max G'Sell 1
Affiliation  

In genome-wide association studies (GWAS), it has become commonplace to test millions of single-nucleotide polymorphisms (SNPs) for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene’s test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive $P$-value thresholding, guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.

中文翻译:

一种基于基因的测试方法,可解释附近基因之间测试的依赖性

在全基因组关联研究 (GWAS) 中,测试数百万个单核苷酸多态性 (SNP) 的表型关联已变得司空见惯。基于基因的测试可以通过减少多次测试和汇集信号强度来提高检测微弱信号的能力。虽然此类测试解释了每个基因内 SNP 等位基因的连锁不平衡 (LD) 结构,但目前的方法不能捕获落在不同附近基因中的 S​​NP 的 LD,这可能会导致基于基因的测试统计数据的相关性。我们引入了一种算法来解释这种相关性。当一个基因的检验统计量独立于其他时,它是单独评估的;当附近基因的测试统计数据高度相关时,它们的 SNP 会聚集并作为一个基因座进行测试。为了深入了解 SNP 和基因座内驱动关联的基因,我们开发了一个交互式可视化工具来探索本地化信号。我们在针对自闭症谱系障碍的弱功率 GWAS 的背景下展示了我们的方法,这与针对精神分裂症和教育程度的更高功率 GWAS 形成鲜明对比。为了增加这些分析的能力,尤其是那些自闭症的分析,我们使用自适应 $P$ 值阈值,由梯度提升树建模的高维元数据引导,突出显示它何时以及如何最有用。值得注意的是,我们的工作流程基于汇总统计数据。我们使用自适应 $P$-value 阈值,由梯度提升树建模的高维元数据引导,突出显示它何时以及如何最有用。值得注意的是,我们的工作流程基于汇总统计数据。我们使用自适应 $P$-value 阈值,由梯度提升树建模的高维元数据引导,突出显示它何时以及如何最有用。值得注意的是,我们的工作流程基于汇总统计数据。
更新日期:2021-07-29
down
wechat
bug