Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients,Open Heart

当前位置： X-MOL 学术 › Open Heart › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients
Open Heart ( IF 2.8 ) Pub Date : 2021-10-01 , DOI: 10.1136/openhrt-2021-001802
Ashish Sarraju ₁ , Andrew Ward ₂ , Sukyung Chung ₃ , Jiang Li ₃ , David Scheinker _{4,

5} , Fàtima Rodríguez ₆

Affiliation

Objectives Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from randomised clinical trials (Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention, TRS 2°P). Methods We identified patients with CVD in a large health system, including atherosclerotic CVD (ASCVD), split into 80% training and 20% test sets. A rich set of EHR patient features was extracted. ML models were trained to estimate 5-year CVD event risk (random forests (RF), gradient-boosted machines (GBM), extreme gradient-boosted models (XGBoost), logistic regression with an L2 penalty and L1 penalty (Lasso)). ML models and TRS 2°P were evaluated by the area under the receiver operating characteristic curve (AUC). Results The cohort included 32 192 patients (median age 74 years, with 46% female, 63% non-Hispanic white and 12% Asian patients and 23 475 patients with ASCVD). There were 4010 events over 5 years of follow-up. ML models demonstrated good overall performance; XGBoost demonstrated AUC 0.70 (95% CI 0.68 to 0.71) in the full CVD cohort and AUC 0.71 (95% CI 0.69 to 0.73) in patients with ASCVD, with comparable performance by GBM, RF and Lasso. TRS 2°P performed poorly in all CVD (AUC 0.51, 95% CI 0.50 to 0.53) and ASCVD (AUC 0.50, 95% CI 0.48 to 0.52) patients. ML identified nontraditional predictive variables including education level and primary care visits. Conclusions In a multiethnic real-world population, EHR-based ML approaches significantly improved CVD risk stratification for secondary prevention. No data are available. The data analysed during the current study are not publicly available. Due to reasonable privacy and security concerns, the underlying EHR data are not easily redistributable to researchers other than those engaged in the Institutional Review Board-approved research collaborations in the current project. The corresponding author may be contacted for access to EHR data for an IRB approved collaboration.

中文翻译：

机器学习方法改善了多种族患者继发性心血管疾病预防的风险分层

目的识别高危患者对于有效预防心血管疾病 (CVD) 至关重要。与随机临床试验开发的二级预防风险评分相比，基于电子健康记录 (EHR) 的机器学习 (ML) 模型是否可以改善 CVD 风险分层（二级预防心肌梗死风险评分中的溶栓，TRS 2）尚不清楚°P)。方法我们在包括动脉粥样硬化性心血管疾病 (ASCVD) 在内的大型卫生系统中识别出患有 CVD 的患者，分为 80% 的训练集和 20% 的测试集。提取了一组丰富的 EHR 患者特征。训练 ML 模型以估计 5 年 CVD 事件风险（随机森林 (RF)、梯度提升机器 (GBM)、极端梯度提升模型 (XGBoost)、带有 L2 惩罚和 L1 惩罚 (Lasso) 的逻辑回归）。ML 模型和 TRS 2°P 通过受试者工作特征曲线下面积 (AUC) 进行评估。结果该队列包括 32 192 名患者（中位年龄 74 岁，46% 为女性，63% 为非西班牙裔白人，12% 为亚裔患者和 23 475 名 ASCVD 患者）。在 5 年的随访中发生了 4010 起事件。ML 模型表现出良好的整体性能；XGBoost 在整个 CVD 队列中显示 AUC 0.70（95% CI 0.68 至 0.71），在 ASCVD 患者中显示 AUC 0.71（95% CI 0.69 至 0.73），GBM、RF 和 Lasso 的表现相当。TRS 2°P 在所有 CVD（AUC 0.51，95% CI 0.50 至 0.53）和 ASCVD（AUC 0.50，95% CI 0.48 至 0.52）患者中表现不佳。ML 确定了非传统的预测变量，包括教育水平和初级保健就诊。结论在多种族的现实世界人口中，基于 EHR 的 ML 方法显着改善了二级预防的 CVD 风险分层。没有可用的数据。当前研究期间分析的数据不公开。由于合理的隐私和安全问题，除了在当前项目中参与机构审查委员会批准的研究合作的研究人员之外，基础 EHR 数据不容易再分发给研究人员。可以联系相应的作者以获取 IRB 批准的合作的 EHR 数据。除了在当前项目中参与机构审查委员会批准的研究合作的研究人员之外，基础 EHR 数据不容易再分发给研究人员。可以联系相应的作者以获取 IRB 批准的合作的 EHR 数据。除了在当前项目中参与机构审查委员会批准的研究合作的研究人员之外，基础 EHR 数据不容易再分发给研究人员。可以联系相应的作者以获取 IRB 批准的合作的 EHR 数据。

更新日期：2021-10-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文