当前位置: X-MOL 学术Med. Biol. Eng. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection and risk prediction for patients with coronary artery disease using data mining
Medical & Biological Engineering & Computing ( IF 2.6 ) Pub Date : 2020-11-06 , DOI: 10.1007/s11517-020-02268-9
Nashreen Md Idris 1 , Yin Kia Chiam 1 , Kasturi Dewi Varathan 2 , Wan Azman Wan Ahmad 3 , Kok Han Chee 3 , Yih Miin Liew 4
Affiliation  

Coronary artery disease (CAD) is an important cause of mortality across the globe. Early risk prediction of CAD would be able to reduce the death rate by allowing early and targeted treatments. In healthcare, some studies applied data mining techniques and machine learning algorithms on the risk prediction of CAD using patient data collected by hospitals and medical centers. However, most of these studies used all the attributes in the datasets which might reduce the performance of prediction models due to data redundancy. The objective of this research is to identify significant features to build models for predicting the risk level of patients with CAD. In this research, significant features were selected using three methods (i.e., Chi-squared test, recursive feature elimination, and Embedded Decision Tree). Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique was implemented to address the imbalanced dataset issue. The prediction models were built based on the identified significant features and eight machine learning algorithms, utilizing Acute Coronary Syndrome (ACS) datasets provided by National Cardiovascular Disease Database (NCVD) Malaysia. The prediction models were evaluated and compared using six performance evaluation metrics, and the top-performing models have achieved AUC more than 90%.



中文翻译:

基于数据挖掘的冠状动脉疾病患者特征选择和风险预测

冠状动脉疾病 (CAD) 是全球范围内导致死亡的重要原因。CAD的早期风险预测将能够通过允许早期和有针对性的治疗来降低死亡率。在医疗保健领域,一些研究将数据挖掘技术和机器学习算法应用于使用医院和医疗中心收集的患者数据进行 CAD 风险预测。然而,这些研究中的大多数都使用了数据集中的所有属性,这可能会由于数据冗余而降低预测模型的性能。这项研究的目的是确定重要的特征,以建立模型来预测 CAD 患者的风险水平。在这项研究中,使用三种方法(即卡方检验、递归特征消除和嵌入式决策树)选择了重要的特征。实施了合成少数过采样技术 (SMOTE) 过采样技术来解决数据集不平衡问题。预测模型是基于已确定的重要特征和八种机器学习算法,利用马来西亚国家心血管疾病数据库 (NCVD) 提供的急性冠状动脉综合征 (ACS) 数据集构建的。使用六个性能评估指标对预测模型进行评估和比较,性能最佳的模型达到了 90% 以上的 AUC。

更新日期:2020-11-06
down
wechat
bug