Skip to main content

Advertisement

Log in

Performance Analysis of Machine Learning Algorithms for Thyroid Disease

  • Research Article-Electrical Engineering
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Thyroid disease arises from an anomalous growth of thyroid tissue at the verge of the thyroid gland. Thyroid disorderliness normally ensues when this gland releases abnormal amounts of hormones where hypothyroidism (inactive thyroid gland) and hyperthyroidism (hyperactive thyroid gland) are the two main types of thyroid disorder. This study proposes the use of efficient classifiers by using machine learning algorithms in terms of accuracy and other performance evaluation metrics to detect and diagnose thyroid disease. This research presents an extensive analysis of different classifiers which are K-nearest neighbor (KNN), Naïve Bayes, support vector machine, decision tree and logistic regression implemented with or without feature selection techniques. Thyroid data were taken from DHQ Teaching Hospital, Dera Ghazi Khan, Pakistan. Thyroid dataset was unique and different from other existing studies because it included three additional features which were pulse rate, body mass index and blood pressure. Experiment was based on three iterations; the first iteration of the experiment did not employ feature selection while the second and third were with L1-, L2-based feature selection technique. Evaluation and analysis of the experiment have been done which consisted of many factors such as accuracy, precision and receiver operating curve with area under curve. The result indicated that classifiers which involved L1-based feature selection achieved an overall higher accuracy (Naive Bayes 100%, logistic regression 100% and KNN 97.84%) compared to without feature selection and L2-based feature selection technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Abbreviations

k :

Number of neighboring elements

L1 :

L1-norm

L2 :

L2-norm

a, b :

Feature vectors

d :

Distance

References

  1. Miller, K.D., et al.: Cancer treatment and survivorship statistics, 2016. CA Cancer J. Clin. 66(4), 271–289 (2016)

    Article  Google Scholar 

  2. Shroff, S.; Pise, S.; Chalekar, P.; Panicker, S.S.: Thyroid disease diagnosis: a survey. In: IEEE 9th International Conference on Intelligent Systems and Control, 2015 (ISCO 2015), pp. 1–6. IEEE (2015)

  3. Thyroid Cancer: https://seer.cancer.gov/statfacts/html/thyro.html. Accessed 01 Jan 2020

  4. Thyroid Problems: https://medlineplus.gov/thyroiddiseases.html. Accessed 01 Jan 2020

  5. What Is Thyroid Cancer: https://www.cancer.org/cancer/thyroid-cancer/about/what-is-thyroid-cancer. Accessed 01 Jan 2020

  6. Pal, R.; Anand, T.; Dubey, S.K.: Evaluation and performance analysis of classification techniques for thyroid detection. Int. J. Bus. Inf. Syst. 28(2), 163–177 (2018)

    Google Scholar 

  7. Thyroid Patient Information: https://www.thyroid.org/thyroid-information/. Accessed 01 Jan 2020

  8. Acharya, U.R.; Choriappa, P.; Fujita, H., et al.: Thyroid lesion classification in 242 patient population using Gabor transform features from high resolution ultrasound images. Knowl. Based Syst. 107, 235–245 (2016)

    Article  Google Scholar 

  9. Chandel, K.; Kunwar, V.; Sabitha, S.; Choudhury, T.; Mukherjee, S.: A comparative study on thyroid disease detection using K-nearest neighbor and Naive Bayes classification techniques. CSI Trans. 4(2–4), 313–319 (2016)

    Article  Google Scholar 

  10. Bekar, E.T.; Ulutagay, G.; Kantarcı, S.: Classification of thyroid disease by using data mining models: a comparison of decision tree algorithms. Oxf. J. Intell. Decis. Data Sci. 2016(2), 13–28 (2016)

    Article  Google Scholar 

  11. Prasad, V.; Rao, T.S.; Babu, M.S.P.: Thyroid disease diagnosis via hybrid architecture composing rough data sets theory and machine learning algorithms. Soft Comput. 20(3), 1179–1189 (2016)

    Article  Google Scholar 

  12. Mushtaq, Z.; Yaqub, A.; Sani, S.; Khalid, A.: Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets. J. Chin. Inst. Eng. 43(1), 1–13 (2019)

    Google Scholar 

  13. Tomar, D.; Agarwal, S.: A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013)

    Article  Google Scholar 

  14. Jahantigh, F.F.: Kidney diseases diagnosis by using fuzzy logic. In: 2015 International Conference on Industrial Engineering and Operations Management, 2015 (IEOM2015), pp. 2369–2375. IEEE (2015)

  15. Durairaj, M.; Ranjani, V.A.: Data mining applications in healthcare sector: a study. Int. J. Sci. Technol. Res. 2(10), 29–35 (2013)

    Google Scholar 

  16. Liu, D.Y.; Chen, H.-L.; Yang, B.; Lv, X.-E.; Li, L.-N.; Liu, J.: Design of an enhanced fuzzy k-nearest neighbor classifier based computer aided diagnostic system for thyroid disease. J. Med. Syst. 36(5), 3243–3254 (2012)

    Article  Google Scholar 

  17. Acharya, U.R.; Vinitha Sree, V.S.; Molinari, F.; Garberoglio, R.; Witkowska, A.; Suri, J.S.: Automated benign and malignant thyroid lesion characterization and classification in 3D contrast-enhanced ultrasound. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2012 (EMBS2012), pp. 452–455. IEEE (2012)

  18. Kousarrizi, M.R.N.; Seiti, F.; Teshnehlab, M.: An experimental comparative study on thyroid disease diagnosis based on feature subset selection and classification. Int. J. Electr. Comput. Sci. 12(1), 13–19 (2012)

    Google Scholar 

  19. Chen, H.L.; Yang, B.; Wang, G.; Liu, J.: A three-stage expert system based on support vector machines for thyroid disease diagnosis. J. Med. Syst. 36(3), 1953–1963 (2012)

    Article  Google Scholar 

  20. Dogantekin, E.; Dogantekin, A.; Avci, D.: An expert system based on generalized discriminant analysis and wavelet support vector machine for diagnosis of thyroid diseases. Expert Syst. Appl. 38(1), 146–150 (2011)

    Article  Google Scholar 

  21. Keleş, A.; Keles, A.: ESTDD: expert system for thyroid diseases diagnosis. Expert Syst. Appl. 34(1), 242–246 (2008)

    Article  Google Scholar 

  22. Ozyilmaz, L.; Yildirim, T.: Diagnosis of thyroid disease using artificial neural network methods. In: 9th International Conference on Neural Information Processing, 2002 (ICONIP2002), pp. 2033–2036, IEEE (2002)

  23. Teaching Hospital - Dera Ghazi Khan: http://thdgkhan.org/. Accessed 15 Mar 2020

  24. Alcalá-Fdez, J.; Sánchez, J.L.; Garc, S.; Jesus, M.J.D., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17, 255–287 (2011)

    Google Scholar 

  25. Pedregosa, F.; Weiss, R.; Brucher, M.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(2011), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Li, C.; Zhang, S.; Zhang, H.; Pang, L.; Lam, K.; Hui, C.; Zhang, S.: Using the K-nearest neighbor algorithm for the classification of lymph node metastasis in gastric cancer. Comput. Math. Methods Med. (2012)

  27. Chalekar, P.; Shroff, S.; Pise, S.; Panicker, S.S.: Use of K-nearest neighbor in thyroid disease classification. Int. J. Curr. Eng. Sci. Res. 1(2), 2394–2697 (2014)

    Google Scholar 

  28. Mushtaq, Z.; Yaqub, A.; Hassan, A.; Su, S.F.: Performance analysis of supervised classifiers using PCA based techniques on breast cancer. In: International Conference on Engineering and Emerging Technologies, 2019 (ICEET2019), pp. 1–6, IEEE (2019)

  29. Aboudi, N.; Guetari, R.; Khlifa, N.: Multi-objectives optimisation of features selection for the classification of thyroid nodules in ultrasound images. IET Image Process. 14(9), 1901–1908 (2020)

    Article  Google Scholar 

  30. Deepika, M.; Kalaiselvi, K.: A empirical study on disease diagnosis using data mining techniques. In: International Conference on Inventive Communication and Computational Technologies, 2018 (ICICCT2018), pp. 615–620, IEEE (2019)

  31. Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms—Zhi-Hua Zhou—Google Books. CRC Press, Boca Raton (2012)

    Book  Google Scholar 

  32. Lavanya, D.; Rani, K.U.: Performance evaluation of decision tree classifiers on medical datasets. Int. J. Comput. Appl. 26(4), 1–4 (2011)

    Google Scholar 

  33. Yang, Y.; Chen, G.; Reniers, G.: Vulnerability assessment of atmospheric storage tanks to floods based on logistic regression. Reliab. Eng. Syst. Saf. 196, 106721 (2019)

    Article  Google Scholar 

  34. Sahu, B.; Mohanty, S.; Rout, S.: A hybrid approach for breast cancer classification and diagnosis. ICST Trans. Scalable Inf. Syst. 6(20), 2–8 (2019)

    Google Scholar 

  35. Islam, M.M.; Iqbal, H.; Haque, M.R.; Hasan, M.K.: Prediction of breast cancer using support vector machine and K-Nearest neighbors. In: 5th IEEE Region 10 Humanitarian Technology Conference. 2017, pp. 226–229, IEEE (2017)

  36. Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010

    Article  MathSciNet  Google Scholar 

  37. Tharwat, A.: Classification assessment methods. Appl. Comput. Inf. (2018). https://doi.org/10.1016/j.aci.2018.08.003

    Article  Google Scholar 

  38. Anaconda: https://www.anaconda.com/. Accessed 05 Jan 2020

  39. Feature Importance and Feature Selection with XGBoost in Python: https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/. Accessed 05 Jan 2020

  40. Tyagi, A.; Mehra, R.; Saxena, A.: Interactive thyroid disease prediction system using machine learning technique. In: PDGC 2018–2018 5th International Conference on Parallel, Distributed and Grid Computing, pp. 689–693 (2018). https://doi.org/10.1109/PDGC.2018.8745910

Download references

Acknowledgements

I would like to extend my sincere gratitude to Dr. Abid Hussain, Dr. Zarnab Lashari and Dr. Aimen Javed for aiding in gathering Thyroid Data and contributing continuous support. I am thankful to them for their invaluable guidance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hafiz Abbad Ur Rehman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbad Ur Rehman, H., Lin, CY., Mushtaq, Z. et al. Performance Analysis of Machine Learning Algorithms for Thyroid Disease. Arab J Sci Eng 46, 9437–9449 (2021). https://doi.org/10.1007/s13369-020-05206-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-05206-x

Keywords

Navigation