Deep neural network improves the estimation of polygenic risk scores for breast cancer,Journal of Human Genetics

当前位置： X-MOL 学术 › J. Hum. Genet. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep neural network improves the estimation of polygenic risk scores for breast cancer
Journal of Human Genetics ( IF 2.6 ) Pub Date : 2020-10-02 , DOI: 10.1038/s10038-020-00832-7
Adrien Badré ₁ , Li Zhang ₂ , Wellington Muchero ₃ , Justin C Reynolds ₁ , Chongle Pan _{1,

2}

Affiliation

Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA, and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bimodal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case subpopulation with an average PRS significantly higher than the control population and a normal-genetic-risk case subpopulation with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through nonlinear relationships.

中文翻译：

深度神经网络改进了对乳腺癌多基因风险评分的估计

多基因风险评分 (PRS) 根据整个基因组中的许多遗传变异来估计个体对复杂疾病的遗传风险。在这项研究中，我们比较了一系列用于估计乳腺癌 PRS 的计算模型。发现深度神经网络 (DNN) 的性能优于替代机器学习技术和已建立的统计算法，包括 BLUP、BayesA 和 LDpred。在流行率为 50% 的测试队列中，DNN 的受试者工作特征曲线下面积 (AUC) 为 67.4%，BLUP 为 64.2%，BayesA 为 64.5%，LDpred 为 62.4%。BLUP、BayesA 和 LPpred 都生成了在案例总体中遵循正态分布的 PRS。然而，DNN 在案例群体中生成的 PRS 遵循双峰分布，由两个具有明显不同均值的正态分布组成。这表明 DNN 能够将病例人群分为平均 PRS 显着高于对照人群的高遗传风险病例亚群和平均 PRS 与对照人群相似的正常遗传风险病例亚群。这使得 DNN 在测试队列中以 90% 的准确率实现 18.8% 的召回率和 50% 的流行率，这可以推断为 65.4% 的召回率，在 20% 的准确率下，在 12% 的流行率下。对 DNN 模型的解释确定了被分配为无关紧要的显着变体这使得 DNN 在测试队列中以 90% 的准确率实现 18.8% 的召回率和 50% 的流行率，这可以推断为 65.4% 的召回率，在 20% 的准确率下，在 12% 的流行率下。对 DNN 模型的解释确定了被分配为无关紧要的显着变体这使得 DNN 在测试队列中以 90% 的准确率实现 18.8% 的召回率和 50% 的流行率，这可以推断为 65.4% 的召回率，在 20% 的准确率下，在 12% 的流行率下。对 DNN 模型的解释确定了被分配为无关紧要的显着变体p值通过关联研究，但对 DNN 预测很重要。这些变体可能通过非线性关系与表型相关联。

更新日期：2020-10-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11