Data preprocessing for heart disease classification: A systematic literature review.,Computer Methods and Programs in Biomedicine

当前位置： X-MOL 学术 › Comput. Methods Programs Biomed. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data preprocessing for heart disease classification: A systematic literature review.
Computer Methods and Programs in Biomedicine ( IF 6.1 ) Pub Date : 2020-07-03 , DOI: 10.1016/j.cmpb.2020.105635
H Benhar ₁ , A Idri ₂ , J L Fernández-Alemán ₃

Affiliation

Context

Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems.

Objective

The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate.

Method

A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria.

Results

The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation.

中文翻译：

心脏病分类的数据预处理：系统文献综述。

语境

心脏病的早期检测是一项重要的挑战，因为每年有1,730万人死于心脏病。此外，心脏病诊断中的任何错误都可能是危险的，并有可能危及生命。因此，准确的诊断对于心脏病学至关重要。数据挖掘（DM）分类技术已用于诊断心脏病，但仍受到数据质量挑战的限制，例如不一致，噪声，数据丢失，离群值，高维和不平衡数据。因此，为了改善基于心脏病DM的预测系统的性能，数据预处理（DP）技术被用于准备数据。

目的

这项研究的目的是回顾和总结有关在心脏病分类中使用预处理技术的当前证据，包括：（1）最常用的DP任务和技术，（2）DP任务和技术对心脏病的影响心脏病学中分类的性能，（3）使用DP技术时分类器的整体性能，以及（4）比较不同分类器预处理的准确率。

方法

通过鉴定和分析2000年1月至2019年6月期间发表的关于数据预处理在心脏病分类中的应用的经验研究，进行了系统的文献综述。因此，根据上述标准选择并分析了49项研究。

结果

审查结果表明，数据减少是心脏病学中最常用的预处理任务，其次是数据清洁。通常，预处理可以保持或改善心脏病分类器的性能。（ANN + PCA），（ANN + CHI）和（SVM + PCA）之类的组合是有希望的精度术语。然而，由于缺乏解释，将这些模型部署在现实世界的诊断决策支持系统中会受到多种风险和限制。

更新日期：2020-07-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>