Abstract
Background
Medical data, which is critical to human existence, is used to identify potential people prone to any specific complication or disease by the application of appropriate data mining (DM) techniques. DM is specifically applied to extract details for diagnosis, prediction, prevention, and treatment of various diseases. According to the International Diabetes Federation (IDF) 2019 atlas report, diabetes caused 4.2 million deaths over the globe, and hence, it is critical to diagnose diabetes at an early stage.
Material and method
Even though many techniques are available to diagnose diabetes, the methods are not efficient to find hidden patterns with the desired accuracy for correct decision-making. Thus, this paper presents an integrated approach of synthetic minority oversampling technique (SMOTE) and sequential minimal optimization (SMO) algorithms for predicting diabetes. In this proposed two-phase classification model, the first step is pre-processing of data using the SMOTE algorithm, and the second step is SMO classifier. The output of the pre-processing is given to SMO to increase the performance of the classifier.
Result
This classification model achieved an accuracy rate of 99.07% on the PIMA Indian diabetes dataset (PIDD) using our proposed approach. PIDD has been taken from UCI repository for this proposed work; however, the National Institute of Diabetes and digestive kidney disease owned the PIDD. The dataset contains 768 female patients, details each with 8 numeric and one decision class attribute.
Conclusion
The output of the study confirms that the proposed integrated approach of DM could be used as an expert system for diagnosing diabetes in patients at an early stage. The extracted features from this study will be used for the development of a prognostic tool in the form of a mobile application for early diabetes detection.
Similar content being viewed by others
References
Shuja M, Mittal S, Zaman M. Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE. In: Advances in computing and intelligent systems. Springer: Singapore, 2020; pp. 195–211.
Pei D, Zhang C, Quan Y, Guo Q. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J Diabetes Res. 2019; 2019.
International Diabetes Federation, IDF diabetes atlas. Ninth edition 2019.Website: https://www.diabetesatlas.org/en/resources. (Accessed on 14 Sep 2020).
WHO facts on diabetes. 2020. Website: https://www.who.int/news-room/fact-sheets/detail/diabetes. (Accessed on 8 June 2020).
Han J, Kamber M. Pei. Data mining concepts and techniques. MK. 2011.
Devi R, Howsalya D, Bai A, Nagaraja N. A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms. Obes Med. 2020;17:100152.
Ndikumana A, Tran NH, Ho TM, Niyato D, Han Zhu, Hong CS. Joint incentive mechanism for paid content caching and price based cache replacement policy in named data networking. IEEE Access. 2018;6:33702–17.
Brossette SE, Sprague AP, Hardin JM, Waites KB, Jones WT, Moser SA. Association rules and data mining in hospital infection control and public health surveillance. J Am Med Inf Assoc. 1998;5(4):373–81.
Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001;23(1):89–109.
Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Art Intell Med. 2005;34(2):113–27.
Polat K, Güneş S. An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit Signal Process. 2007;17(4):702–10.
Barakat N, Bradley AP, Barakat MNH. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed. 2010;14(4):1114–20.
Aslam MW, Zhu Z, Nandi AK. Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Syst Appl. 2013;40(13):5402–12.
Ahmed TM. Developing a predicted model for diabetes type 2 treatment plans by using data mining. J Theor Appl Inf Technol. 2016;90(2):181.
Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Procedia Comput Sci. 2018;132:1578–85.
Hemanth DJ, Deperlioglu O, Kose U. An enhanced diabetic retinopathy detection and classification approach using deep convolutional neural network. Neural Comput Appl. 2020;32(3):707–21.
Perveen S, Shahbaz M, Ansari MS, Keshavjee K, Guergachi A. A hybrid approach for modeling type 2 diabetes mellitus progression. Front Genet. 2020;10:1076.
PIMA Indian Dataset Source. 2016. Website: https://www.kaggle.com/uciml/pima-indians-diabetes-database. (Accessed on 06-Oct-2016).
Reason of choosing PIMA Indian dataset. 2018.Website: https://www.andreagrandi.it/2018/04/14/machine-learning-pima-indians-diabetes/. (Accessed on 14 Apr 2018).
Chandrasekar P, Qian K, Shahriar H, Bhattacharya P. Improving the prediction accuracy of decision tree mining with data preprocessing. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). (vol. 2, pp. 481–484). IEEE; 2017.
Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S. Clustering-based undersampling in class-imbalanced data. Inf Sci. 2017;409:17–26.
Qian Y, Liang Y, Li M, Feng G, Shi X. A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing. 2014;143:57–67.
Gautheron L, Habrard A, Morvant E, Sebban M. Metric learning from imbalanced data. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 923–930. IEEE; 2019.
Chawla NV, Lazarevic A, Hall LO, Bowyer KW. SMOTEBoost: improving prediction of the minority class in boosting. In European conference on principles of data mining and knowledge discovery (pp. 107–119). Springer: Berlin, Heidelberg, 2003.
Melillo P, De Luca N, Bracale M, Pecchia L. Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability. IEEE J Biomed Health Inf. 2013;17(3):727–33.
Peabody MA, Van Rossum T, Lo R, Brinkman FSL. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics. 2015;16(1):1–19.
Problem SVMO. CS229 Simplified SMO Algorithm. Cs. 2009;229:1–5.
Mirza S, Mittal S, Zaman M. Decision support predictive model for prognosis of diabetes using SMOTE and decision tree. Int J Appl Eng Res. 2018;13(11):9277–82.
Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D. Weka manual for version 3–6–0. University of Waikato, Hamilton, New Zealand. 2; 2008.
Haritha R, Suresh Babu D, Sammulal P. A hybrid approach for prediction of type-1 and type-2 diabetes using firefly and cuckoo search algorithms. Int J Appl Eng Res. 2018;13(2):896–907.
Malik S, Harous S, El-Sayed H. Comparative analysis of machine learning algorithms for early prediction of diabetes mellitus in women. In International Symposium on Modelling and Implementation of Complex Systems (pp. 95–106). Springer: Cham; 2020.
Manikandan K. Diagnosis of diabetes diseases using optimized fuzzy rule set by grey wolf optimization. Pattern Recogn Lett. 2019;125:432–8.
Çalişir D, Doğantekin E. An automatic diabetes diagnosis system based on LDA-wavelet support vector machine classifier. Expert Syst Appl. 2011;38(7):8311–5.
Dadgar SMH, Kaardaan M. A hybrid method of feature selection and neural network with genetic algorithm to predict diabetes.
Chen W, Chen S, Zhang H, Wu T. A hybrid prediction model for type 2 diabetes using K-means and decision tree. In 2017 8th IEEE International conference on software engineering and service science (ICSESS) (pp. 386–390). IEEE; 2017.
Patil RN, Tamane S. A novel scheme for predicting type 2 diabetes in women: using kmeans with PCA as dimensionality reduction. Int J Comput Eng Appl XI (VIII). 2017; 76–87.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Naz, H., Ahuja, S. SMOTE-SMO-based expert system for type II diabetes detection using PIMA dataset. Int J Diabetes Dev Ctries 42, 245–253 (2022). https://doi.org/10.1007/s13410-021-00969-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13410-021-00969-x