当前位置: X-MOL 学术Epigenetics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
Epigenetics ( IF 2.9 ) Pub Date : 2020-10-22 , DOI: 10.1080/15592294.2020.1824097
Chang Shu 1, 2 , Amy C Justice 2, 3 , Xinyu Zhang 1, 2 , Vincent C Marconi 4 , Dana B Hancock 5 , Eric O Johnson 5, 6 , Ke Xu 1, 2
Affiliation  

ABSTRACT

Background: With the improved life expectancy of people living with HIV (PLWH), identifying vulnerable subpopulations at high mortality risk is important. Evidences showed that DNA methylation (DNAm) is associated with mortality in non-HIV populations. Here, we established a panel of DNAm biomarkers that can predict mortality risk among PLWH.

Methods: 1,081 HIV-positive participants from the Veterans Ageing Cohort Study (VACS) were divided into training (N = 460), validation (N = 114), and testing (N = 507) sets. VACS index was used as a measure of mortality risk among PLWH. Model training and fine-tuning were conducted using the ensemble method in the training and validation sets and prediction performance was assessed in the testing set. The survival analysis comparing the predicted high and low mortality risk groups and the Gene Ontology enrichment analysis of the predictive CpG sites were performed.

Results: We selected a panel of 393 CpGs for the ensemble prediction model that showed excellent performance in predicting high mortality risk with an auROC of 0.809 (95%CI: 0.767,0.851) and a balanced accuracy of 0.653 (95%CI: 0.611, 0.693) in the testing set. The high mortality risk group was significantly associated with 10-year mortality (hazard ratio = 1.79, p = 4E-05) compared with low risk group. These 393 CpGs were located in 280 genes enriched in immune and inflammation response pathways.

Conclusions: We identified a panel of DNAm features associated with mortality risk in PLWH. These DNAm features may serve as predictive biomarkers for mortality risk among PLWH.

Abbreviations: AUC: Area Under Curve; CI: Confidence interval; DMR: differentially methylated region; DNA: Deoxyribonucleic acid; DNAm: DNA methylation; DAVID: Database for Annotation, Visualization, and Integrated Discovery; EWA: epigenome-wide association; FDR: False discovery rate; FWER: Family-wise error rate; GLMNET: elastic-net-regularized generalized linear models; GO: Gene ontology; HIV: Human immunodeficiency virus; HM450K: Human Methylation 450 K BeadChip; k-NN: k-nearest neighbours; NK: Natural killer; PC: Principal component; PLWH: people living with HIV; QC: Quality control; SVM: Support Vector Machines; VACS: Veterans Ageing Cohort Study; XGBoost: Extreme Gradient Boosting Tree



中文翻译:

通过集成机器学习方法选择的 DNA 甲基化生物标志物可预测 HIV 阳性退伍军人人群的死亡风险

摘要

背景:随着 HIV 感染者 (PLWH) 预期寿命的提高,识别具有高死亡率风险的脆弱亚群非常重要。有证据表明,DNA 甲基化 (DNAm) 与非 HIV 人群的死亡率有关。在这里,我们建立了一组 DNAm 生物标志物,可以预测 PLWH 的死亡风险。

方法:退伍军人老龄化队列研究 (VACS) 中的 1,081 名 HIV 阳性参与者被分为训练组 (N = 460)、验证组 (N = 114) 和测试组 (N = 507)。VACS 指数被用来衡量 PLWH 的死亡风险。在训练和验证集中使用集成方法进行模型训练和微调,并在测试集中评估预测性能。进行了比较预测的高死亡率和低死亡率风险组的生存分析和预测 CpG 位点的基因本体富集分析。

结果:我们为整体预测模型选择了一组 393 个 CpG,该模型在预测高死亡率风险方面表现出出色的性能,auROC 为 0.809(95%CI:0.767,0.851),平衡准确度为 0.653(95%CI:0.611, 0.693)在测试集中。与低风险组相比,高死亡率风险组与 10 年死亡率显着相关(风险比 = 1.79,p = 4E-05)。这 393 个 CpG 位于 280 个富含免疫和炎症反应途径的基因中。

结论:我们确定了一组与 PLWH 死亡风险相关的 DNAm 特征。这些 DNAm 特征可作为 PLWH 死亡风险的预测生物标志物。

缩写:AUC:曲线下面积;CI:置信区间;DMR:差异甲基化区域;DNA:脱氧核糖核酸;DNAm:DNA甲基化;DAVID:用于注释、可视化和集成发现的数据库;EWA:表观基因组关联;FDR:错误发现率;FWER:家庭错误率;GLMNET:弹性网络正则化广义线性模型;GO:基因本体;HIV:人类免疫缺陷病毒;HM450K:人类甲基化 450 K BeadChip;k-NN:k-最近邻;NK:自然杀手;PC:主成分;PLWH:艾滋病病毒感染者;QC:质量控制;SVM:支持向量机;VACS:退伍军人老龄化队列研究;XGBoost:极端梯度提升树

更新日期:2020-10-22
down
wechat
bug