当前位置: X-MOL 学术Int. J. Syst. Assur. Eng. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Early and accurate prediction of diabetics based on FCBF feature selection and SMOTE
International Journal of System Assurance Engineering and Management Pub Date : 2021-06-23 , DOI: 10.1007/s13198-021-01174-z
Amit Kishor , Chinmay Chakraborty

Diabetes is a chronic hyperglycemic disorder. Every year hundreds of millions of people around the world have diabetes. The presence of irrelevant features and an imbalanced dataset are significant issues to train the model. The availability of patient medical records quantifies symptoms, body characteristics, and clinical laboratory test values that can be used in the study of biostatistics aimed at identifying patterns or characteristics that cannot be detected by current practice. This work proposes a machine learning-based healthcare model for accurate and early detection of diabetics. Five machine learning classifiers such as logistic regression, K-nearest neighbor, Naïve Bayes, random forest, and support vector machine are used. Fast correlation-based filter feature selection is used to remove the irrelevant features. The synthetic minority over-sampling technique is used to balance the imbalanced dataset. The model is evaluated with four performance measuring matrices: accuracy, sensitivity, specificity, and area under the curve (AUC). An experimental outcome shows few relevant features are needed to enhance the accuracy of the developed model. The RF classifier achieves the highest accuracy, sensitivity, specificity, and AUC of 97.81%, 99.32%, 98.86%, and 99.35%.



中文翻译:

基于FCBF特征选择和SMOTE的糖尿病早期准确预测

糖尿病是一种慢性高血糖症。全世界每年有数亿人患有糖尿病。不相关特征的存在和不平衡的数据集是训练模型的重要问题。患者病历的可用性量化了症状、身体特征和临床实验室测试值,这些值可用于生物统计学研究,旨在识别当前实践无法检测到的模式或特征。这项工作提出了一种基于机器学习的医疗保健模型,用于准确和早期检测糖尿病患者。使用了五个机器学习分类器,例如逻辑回归、K-最近邻、朴素贝叶斯、随机森林和支持向量机。基于快速相关的滤波器特征选择用于去除不相关的特征。合成少数过采样技术用于平衡不平衡的数据集。该模型使用四个性能测量矩阵进行评估:准确度、灵敏度、特异性和曲线下面积 (AUC)。实验结果表明,几乎不需要相关特征来提高所开发模型的准确性。RF 分类器实现了最高的准确度、灵敏度、特异性和 AUC,分别为 97.81%、99.32%、98.86% 和 99.35%。

更新日期:2021-06-23
down
wechat
bug