A flexible and parallelizable approach to genome-wide polygenic risk scores.,Genetic Epidemiology

当前位置： X-MOL 学术 › Genet. Epidemiol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A flexible and parallelizable approach to genome-wide polygenic risk scores.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2019-07-22 , DOI: 10.1002/gepi.22245
Paul J Newcombe ₁ , Christopher P Nelson _{2,

3} , Nilesh J Samani _{2,

3} , Frank Dudbridge ₄

Affiliation

The heritability of most complex traits is driven by variants throughout the genome. Consequently, polygenic risk scores, which combine information on multiple variants genome-wide, have demonstrated improved accuracy in genetic risk prediction. We present a new two-step approach to constructing genome-wide polygenic risk scores from meta-GWAS summary statistics. Local linkage disequilibrium (LD) is adjusted for in Step 1, followed by, uniquely, long-range LD in Step 2. Our algorithm is highly parallelizable since block-wise analyses in Step 1 can be distributed across a high-performance computing cluster, and flexible, since sparsity and heritability are estimated within each block. Inference is obtained through a formal Bayesian variable selection framework, meaning final risk predictions are averaged over competing models. We compared our method to two alternative approaches: LDPred and lassosum using all seven traits in the Welcome Trust Case Control Consortium as well as meta-GWAS summaries for type 1 diabetes (T1D), coronary artery disease, and schizophrenia. Performance was generally similar across methods, although our framework provided more accurate predictions for T1D, for which there are multiple heterogeneous signals in regions of both short- and long-range LD. With sufficient compute resources, our method also allows the fastest runtimes.

中文翻译：

一种灵活且可并行的全基因组多基因风险评分方法。

大多数复杂性状的遗传力是由整个基因组中的变异驱动的。因此，结合了全基因组多个变体信息的多基因风险评分已证明遗传风险预测的准确性有所提高。我们提出了一种新的两步方法，用于从 meta-GWAS 汇总统计数据构建全基因组多基因风险评分。在步骤 1 中调整局部链接不平衡 (LD)，然后在步骤 2 中唯一地调整远程 LD。我们的算法是高度可并行化的，因为步骤 1 中的分块分析可以分布在高性能计算集群中，并且灵活，因为在每个块内估计了稀疏性和遗传性。推断是通过正式的贝叶斯变量选择框架获得的，这意味着最终风险预测是在竞争模型上进行平均的。我们将我们的方法与两种替代方法进行了比较：使用 Welcome Trust Case Control Consortium 中的所有七个特征的 LDPred 和 lassosum 以及针对 1 型糖尿病 (T1D)、冠状动脉疾病和精神分裂症的 meta-GWAS 总结。尽管我们的框架为 T1D 提供了更准确的预测，但不同方法的性能通常相似，因为在短程和长程 LD 的区域中存在多个异质信号。有了足够的计算资源，我们的方法还可以实现最快的运行时间。尽管我们的框架为 T1D 提供了更准确的预测，但在短程和长程 LD 的区域中存在多个异质信号。有了足够的计算资源，我们的方法还可以实现最快的运行时间。尽管我们的框架为 T1D 提供了更准确的预测，但在短程和长程 LD 的区域中存在多个异质信号。有了足够的计算资源，我们的方法还可以实现最快的运行时间。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11