当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2020-01-10 , DOI: 10.1002/gepi.22279
Damian Gola 1 , Jeannette Erdmann 2 , Bertram Müller-Myhsok 3 , Heribert Schunkert 4 , Inke R König 1
Affiliation  

Coronary artery disease (CAD) is the leading global cause of mortality and has substantial heritability with a polygenic architecture. Recent approaches of risk prediction were based on polygenic risk scores (PRS) not taking possible nonlinear effects into account and restricted in that they focused on genetic loci associated with CAD, only. We benchmarked PRS, (penalized) logistic regression, naïve Bayes (NB), random forests (RF), support vector machines (SVM), and gradient boosting (GB) on a data set of 7,736 CAD cases and 6,774 controls from Germany to identify the algorithms for most accurate classification of CAD status. The final models were tested on an independent data set from Germany (527 CAD cases and 473 controls). We found PRS to be the best algorithm, yielding an area under the receiver operating curve (AUC) of 0.92 (95% CI [0.90, 0.95], 50,633 loci) in the German test data. NB and SVM (AUC ~ 0.81) performed better than RF and GB (AUC ~ 0.75). We conclude that using PRS to predict CAD is superior to machine learning methods.

中文翻译:

在预测冠状动脉疾病状态时,多基因风险评分优于机器学习方法。

冠状动脉疾病(CAD)是导致死亡的全球首要原因,并且具有多基因结构,可遗传性强。最近的风险预测方法是基于多基因风险评分(PRS)的,没有考虑到可能的非线性影响,并且受到限制,因为它们仅关注与CAD相关的遗传基因座。我们对来自德国的7,736个CAD案例和6,774个控件的数据集对PRS,(惩罚式)逻辑回归,朴素贝叶斯(NB),随机森林(RF),支持向量机(SVM)和梯度增强(GB)进行了基准测试,以确定用于对CAD状态进行最准确分类的算法。最终模型在来自德国的独立数据集(527个CAD案例和473个控件)上进行了测试。我们发现PRS是最好的算法,接收器工作曲线(AUC)下的面积为0.92(95%CI [0.90,0。95],50,633个基因座)。NB和SVM(AUC〜0.81)的性能优于RF和GB(AUC〜0.75)。我们得出结论,使用PRS预测CAD优于机器学习方法。
更新日期:2020-01-10
down
wechat
bug