当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus
Multimedia Systems ( IF 3.5 ) Pub Date : 2021-06-06 , DOI: 10.1007/s00530-021-00817-2
Chandrashekhar Azad , Bharat Bhushan , Rohit Sharma , Achyut Shankar , Krishna Kant Singh , Aditya Khamparia

Diabetes mellitus is a well-known chronic disease that diminishes the insulin producing capability of the human body. This results in high blood sugar level which might lead to various complications such as eye damage, nerve damage, cardiovascular damage, kidney damage and stroke. Although diabetes has attracted huge research attention, the overall performance of such medical disease classification using machine learning techniques is relatively low, majorly due to existence of class imbalance and missing values in the data. In this paper, we propose a novel Prediction Model using Synthetic Minority Oversampling Technique, Genetic Algorithm and Decision Tree (PMSGD) for Classification of Diabetes Mellitus on Pima Indians Diabetes Database (PIDD) dataset. The framework of the proposed PMSGD prediction model is composed of four different layers. The first layer is the pre-processing layer which is responsible for handling missing values, detection of outlier and oversampling the minority class. In the second layer, the most significant features are selected using correlation and genetic algorithm. In the third layer, the proposed model is trained, and its effectiveness is evaluated in the fourth layer in terms of classification accuracy (CA), classification error (CE), precision, recall (sensitivity), measure (FM), and Area_Under_ROC (AUROC). The proposed PMSGD algorithm clearly outperforms its counterparts and achieves a remarkable accuracy of 82.1256%. The best outcome achieved by the proposed system in terms of CA, CE, precision, sensitivity, FM and AUROC is 82.1256%, 17.8744%, 0.8070%, 0.8598, 0.8326 and 0.8511, respectively. The obtained simulation results show the effectiveness and superiority of our proposed PMSGD model and their by reduced error rate to help in decision-making process.



中文翻译:

使用 SMOTE、遗传算法和决策树 (PMSGD) 进行糖尿病分类的预测模型

糖尿病是一种众所周知的慢性疾病,它会降低人体产生胰岛素的能力。这会导致高血糖水平,这可能导致各种并发症,例如眼睛损伤、神经损伤、心血管损伤、肾脏损伤和中风。尽管糖尿病引起了巨大的研究关注,但使用机器学习技术进行此类医学疾病分类的整体性能相对较低,主要是由于数据中存在类不平衡和缺失值。在本文中,我们提出了一种新颖的P rediction中号使用Odel等小号ynthetic少数民族过采样技术,ģ enetic算法和d用于在皮马印第安人糖尿病数据库 (PIDD) 数据集上对糖尿病进行分类的切割树 (PMSGD)。所提出的 PMSGD 预测模型的框架由四个不同的层组成。第一层是预处理层,负责处理缺失值、检测异常值和对少数类进行过采样。在第二层中,使用相关性和遗传算法选择最重要的特征。第三层训练提出的模型,第四层从分类准确率(CA)、分类误差(CE)、精度、召回率(敏感性)、度量(FM)和Area_Under_ROC(欧罗克)。所提出的 PMSGD 算法明显优于同类算法,并达到了 82.1256% 的显着准确率。所提出的系统在 CA、CE、精度、灵敏度、FM 和 AUROC 方面取得的最佳结果分别为 82.1256%、17.8744%、0.8070%、0.8598、0.8326 和 0.8511。获得的仿真结果表明了我们提出的 PMSGD 模型的有效性和优越性,并通过降低错误率来帮助决策过程。

更新日期:2021-06-07
down
wechat
bug