当前位置: X-MOL 学术medRxiv. Genet. Genom. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging fine-mapping and non-European training data to improve cross-population polygenic risk scores
medRxiv - Genetic and Genomic Medicine Pub Date : 2021-08-20 , DOI: 10.1101/2021.01.19.21249483
Omer Weissbrod , Masahiro Kanai , Huwenbo Shi , Steven Gazal , Wouter J. Peyrot , Amit V. Khera , Yukinori Okada , Alicia R. Martin , Hilary Finucane , Alkes L. Price ,

Polygenic risk scores (PRS) based on European training data suffer reduced accuracy in non-European target populations, exacerbating health disparities. This loss of accuracy predominantly stems from LD differences, MAF differences (including population-specific SNPs), and/or causal effect size differences. PRS based on training data from the non-European target population do not suffer from these limitations, but are currently limited by much smaller training sample sizes. Here, we propose PolyPred, a method that improves cross-population polygenic prediction by combining two complementary predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing LD differences; and BOLT-LMM, a published predictor. In the special case where a large training sample is available in the non-European target population (or a closely related population), we propose PolyPred+, which further incorporates the non-European training data, addressing MAF differences and causal effect size differences. PolyPred and PolyPred+ require individual-level training data (for their BOLT-LMM component), but we also propose analogous methods that replace the BOLT-LMM component with summary statistic-based components if only summary statistics are available. We applied PolyPred to 49 diseases and complex traits in 4 UK Biobank populations using UK Biobank British training data (average N=325K), and observed statistically significant average relative improvements in prediction accuracy vs. BOLT-LMM ranging from +7% in South Asians to +32% in Africans (and vs. LD-pruning + P-value thresholding (P+T) ranging from +77% to +164%), consistent with simulations. We applied PolyPred+ to 23 diseases and complex traits in UK Biobank East Asians using both UK Biobank British (average N=325K) and Biobank Japan (average N=124K) training data, and observed statistically significant average relative improvements in prediction accuracy of +24% vs. BOLT-LMM and +12% vs. PolyPred. The summary statistic-based analogues of PolyPred and PolyPred+ attained similar improvements. In conclusion, PolyPred and PolyPred+ improve cross-population polygenic prediction accuracy, ameliorating health disparities.

中文翻译:

利用精细映射和非欧洲训练数据来提高跨群体多基因风险评分

基于欧洲训练数据的多基因风险评分 (PRS) 在非欧洲目标人群中的准确性降低,从而加剧了健康差异。这种准确性的损失主要源于 LD 差异、MAF 差异(包括特定人群的 SNP)和/或因果效应大小差异。基于来自非欧洲目标人群的训练数据的 PRS 不受这些限制,但目前受到更小的训练样本量的限制。在这里,我们提出了 PolyPred,一种通过结合两个互补的预测因子来改进跨种群多基因预测的方法:一种新的预测因子,它利用功能知情的精细映射来估计因果效应(而不是标记效应),解决 LD 差异;和 BOLT-LMM,一个已发布的预测器。在非欧洲目标人群(或密切相关的人群)中有大量训练样本可用的特殊情况下,我们提出了 PolyPred+,它进一步结合了非欧洲训练数据,解决了 MAF 差异和因果效应大小差异。PolyPred 和 PolyPred+ 需要个人级别的训练数据(对于它们的 BOLT-LMM 组件),但我们也提出了类似的方法,如果只有汇总统计可用,则用基于汇总统计的组件替换 BOLT-LMM 组件。我们使用 UK Biobank 英国训练数据(平均 解决 MAF 差异和因果效应大小差异。PolyPred 和 PolyPred+ 需要个人级别的训练数据(对于它们的 BOLT-LMM 组件),但我们也提出了类似的方法,如果只有汇总统计可用,则用基于汇总统计的组件替换 BOLT-LMM 组件。我们使用 UK Biobank 英国训练数据(平均 解决 MAF 差异和因果效应大小差异。PolyPred 和 PolyPred+ 需要个人级别的训练数据(对于它们的 BOLT-LMM 组件),但我们也提出了类似的方法,如果只有汇总统计可用,则用基于汇总统计的组件替换 BOLT-LMM 组件。我们使用 UK Biobank 英国训练数据(平均N = 325K),并观察到预测准确性与 BOLT-LMM 相比在统计上显着平均相对提高,范围从南亚人的 +7% 到非洲人的 +32%(以及与 LD 修剪 + P 值阈值 (P+T ) 范围从 +77% 到 +164%),与模拟一致。我们使用 UK Biobank British(平均N = 325K)和 Biobank Japan(平均N= 124K) 训练数据,并观察到预测准确度相对 BOLT-LMM 有 +24% 和相对 PolyPred 有 +12% 的统计显着平均相对提高。PolyPred 和 PolyPred+ 的基于汇总统计的类似物获得了类似的改进。总之,PolyPred 和 PolyPred+ 提高了跨群体多基因预测的准确性,改善了健康差异。
更新日期:2021-08-23
down
wechat
bug