Skip to main content

Advertisement

Log in

SMOTE-SMO-based expert system for type II diabetes detection using PIMA dataset

  • Original Article
  • Published:
International Journal of Diabetes in Developing Countries Aims and scope Submit manuscript

Abstract

Background

Medical data, which is critical to human existence, is used to identify potential people prone to any specific complication or disease by the application of appropriate data mining (DM) techniques. DM is specifically applied to extract details for diagnosis, prediction, prevention, and treatment of various diseases. According to the International Diabetes Federation (IDF) 2019 atlas report, diabetes caused 4.2 million deaths over the globe, and hence, it is critical to diagnose diabetes at an early stage.

Material and method

Even though many techniques are available to diagnose diabetes, the methods are not efficient to find hidden patterns with the desired accuracy for correct decision-making. Thus, this paper presents an integrated approach of synthetic minority oversampling technique (SMOTE) and sequential minimal optimization (SMO) algorithms for predicting diabetes. In this proposed two-phase classification model, the first step is pre-processing of data using the SMOTE algorithm, and the second step is SMO classifier. The output of the pre-processing is given to SMO to increase the performance of the classifier.

Result

This classification model achieved an accuracy rate of 99.07% on the PIMA Indian diabetes dataset (PIDD) using our proposed approach. PIDD has been taken from UCI repository for this proposed work; however, the National Institute of Diabetes and digestive kidney disease owned the PIDD. The dataset contains 768 female patients, details each with 8 numeric and one decision class attribute.

Conclusion

The output of the study confirms that the proposed integrated approach of DM could be used as an expert system for diagnosing diabetes in patients at an early stage. The extracted features from this study will be used for the development of a prognostic tool in the form of a mobile application for early diabetes detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Shuja M, Mittal S, Zaman M. Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE. In: Advances in computing and intelligent systems. Springer: Singapore, 2020; pp. 195–211.

  2. Pei D, Zhang C, Quan Y, Guo Q. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J Diabetes Res. 2019; 2019.

  3. International Diabetes Federation, IDF diabetes atlas. Ninth edition 2019.Website: https://www.diabetesatlas.org/en/resources. (Accessed on 14 Sep 2020).

  4. WHO facts on diabetes. 2020. Website: https://www.who.int/news-room/fact-sheets/detail/diabetes. (Accessed on 8 June 2020).

  5. Han J, Kamber M. Pei. Data mining concepts and techniques. MK. 2011.

  6. Devi R, Howsalya D, Bai A, Nagaraja N. A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms. Obes Med. 2020;17:100152.

    Article  Google Scholar 

  7. Ndikumana A, Tran NH, Ho TM, Niyato D, Han Zhu, Hong CS. Joint incentive mechanism for paid content caching and price based cache replacement policy in named data networking. IEEE Access. 2018;6:33702–17.

    Article  Google Scholar 

  8. Brossette SE, Sprague AP, Hardin JM, Waites KB, Jones WT, Moser SA. Association rules and data mining in hospital infection control and public health surveillance. J Am Med Inf Assoc. 1998;5(4):373–81.

    Article  CAS  Google Scholar 

  9. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001;23(1):89–109.

    Article  CAS  Google Scholar 

  10. Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Art Intell Med. 2005;34(2):113–27.

    Article  Google Scholar 

  11. Polat K, Güneş S. An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit Signal Process. 2007;17(4):702–10.

    Article  Google Scholar 

  12. Barakat N, Bradley AP, Barakat MNH. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed. 2010;14(4):1114–20.

    Article  Google Scholar 

  13. Aslam MW, Zhu Z, Nandi AK. Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Syst Appl. 2013;40(13):5402–12.

    Article  Google Scholar 

  14. Ahmed TM. Developing a predicted model for diabetes type 2 treatment plans by using data mining. J Theor Appl Inf Technol. 2016;90(2):181.

    Google Scholar 

  15. Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Procedia Comput Sci. 2018;132:1578–85.

    Article  Google Scholar 

  16. Hemanth DJ, Deperlioglu O, Kose U. An enhanced diabetic retinopathy detection and classification approach using deep convolutional neural network. Neural Comput Appl. 2020;32(3):707–21.

    Article  Google Scholar 

  17. Perveen S, Shahbaz M, Ansari MS, Keshavjee K, Guergachi A. A hybrid approach for modeling type 2 diabetes mellitus progression. Front Genet. 2020;10:1076.

    Article  Google Scholar 

  18. PIMA Indian Dataset Source. 2016. Website: https://www.kaggle.com/uciml/pima-indians-diabetes-database. (Accessed on 06-Oct-2016).

  19. Reason of choosing PIMA Indian dataset. 2018.Website: https://www.andreagrandi.it/2018/04/14/machine-learning-pima-indians-diabetes/. (Accessed on 14 Apr 2018).

  20. Chandrasekar P, Qian K, Shahriar H, Bhattacharya P. Improving the prediction accuracy of decision tree mining with data preprocessing. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). (vol. 2, pp. 481–484). IEEE; 2017.

  21. Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S. Clustering-based undersampling in class-imbalanced data. Inf Sci. 2017;409:17–26.

    Article  Google Scholar 

  22. Qian Y, Liang Y, Li M, Feng G, Shi X. A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing. 2014;143:57–67.

    Article  Google Scholar 

  23. Gautheron L, Habrard A, Morvant E, Sebban M. Metric learning from imbalanced data. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 923–930. IEEE; 2019.

  24. Chawla NV, Lazarevic A, Hall LO, Bowyer KW. SMOTEBoost: improving prediction of the minority class in boosting. In European conference on principles of data mining and knowledge discovery (pp. 107–119). Springer: Berlin, Heidelberg, 2003.

  25. Melillo P, De Luca N, Bracale M, Pecchia L. Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability. IEEE J Biomed Health Inf. 2013;17(3):727–33.

    Article  Google Scholar 

  26. Peabody MA, Van Rossum T, Lo R, Brinkman FSL. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics. 2015;16(1):1–19.

    Article  Google Scholar 

  27. Problem SVMO. CS229 Simplified SMO Algorithm. Cs. 2009;229:1–5.

    Google Scholar 

  28. Mirza S, Mittal S, Zaman M. Decision support predictive model for prognosis of diabetes using SMOTE and decision tree. Int J Appl Eng Res. 2018;13(11):9277–82.

    Google Scholar 

  29. Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D. Weka manual for version 3–6–0. University of Waikato, Hamilton, New Zealand. 2; 2008.

  30. Haritha R, Suresh Babu D, Sammulal P. A hybrid approach for prediction of type-1 and type-2 diabetes using firefly and cuckoo search algorithms. Int J Appl Eng Res. 2018;13(2):896–907.

    Google Scholar 

  31. Malik S, Harous S, El-Sayed H. Comparative analysis of machine learning algorithms for early prediction of diabetes mellitus in women. In International Symposium on Modelling and Implementation of Complex Systems (pp. 95–106). Springer: Cham; 2020.

  32. Manikandan K. Diagnosis of diabetes diseases using optimized fuzzy rule set by grey wolf optimization. Pattern Recogn Lett. 2019;125:432–8.

    Article  Google Scholar 

  33. Çalişir D, Doğantekin E. An automatic diabetes diagnosis system based on LDA-wavelet support vector machine classifier. Expert Syst Appl. 2011;38(7):8311–5.

    Article  Google Scholar 

  34. Dadgar SMH, Kaardaan M. A hybrid method of feature selection and neural network with genetic algorithm to predict diabetes.

  35. Chen W, Chen S, Zhang H, Wu T. A hybrid prediction model for type 2 diabetes using K-means and decision tree. In 2017 8th IEEE International conference on software engineering and service science (ICSESS) (pp. 386–390). IEEE; 2017.

  36. Patil RN, Tamane S. A novel scheme for predicting type 2 diabetes in women: using kmeans with PCA as dimensionality reduction. Int J Comput Eng Appl XI (VIII). 2017; 76–87.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huma Naz.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naz, H., Ahuja, S. SMOTE-SMO-based expert system for type II diabetes detection using PIMA dataset. Int J Diabetes Dev Ctries 42, 245–253 (2022). https://doi.org/10.1007/s13410-021-00969-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13410-021-00969-x

Keywords

Navigation