PSO-FCM based data mining model to predict diabetic disease

https://doi.org/10.1016/j.cmpb.2020.105659Get rights and content

Highlights

  • A new model is proposed for forecasting type 2 diabetes mellitus (T2DM) based on data mining strategies.

  • The purpose of the study is to enforce Particle Swarm Optimization (PSO) and Fuzzy Clustering Means (FCM) (PSO-FCM).

  • The proposed method is to evaluate on a set of medical data relating to a diabetes diagnosis challenge.

  • It was found that the prototype has achieved 8.26 percent more accuracy than the other methods.

  • The proposed PSO-FCM method delivers greater performance when compared with other models.

Abstract

Background and Objective

Diabetic disease is typically composed because of higher than normal blood sugar levels. Instead the production of insulin may be regarded insufficient. It has been noted in recent days that the percentage of diabetes-affected patients have grown to a larger extent throughout the world. Evidently, this problem must be taken more seriously in the coming days to ensure that the average percentages of diabetes-affected individuals are reduced. Recently, several research teams conducted detailed research on the data mining platform to determine the precision of each other. Data mining can be used by parametric modeling from the health data, including diabetic patient data sets, to synthesize expertise in the field.

Methods

In this study, a new model is proposed for forecasting type 2 diabetes mellitus (T2DM) based on data mining strategies. The combined Particle Swarm Optimization (PSO) and Fuzzy Clustering Means (FCM) (PSO-FCM) are used to evaluate a set of medical data relating to a diabetes diagnosis challenge.

Results

Experiments are performed on the Pima Indians Diabetes Database. The sensitivity, specificity and accuracy metrics widely used in medical studies have been used to assess the effectiveness of the proposed system reliability. It was found that the prototype has achieved 8.26 percent more accuracy than the other methods.

Conclusion

The conclusion produced by using the method shows that, as compared with other models, the proposed PSO-FCM method delivers greater performance.

Introduction

Individuals in the universe today suffer from many diseases, the most pervasive being diabetes. The efficient factors leading to the rise of this disease include dietary changes, lack of exercise, stress boost based on different concerns and obesity. The pancreas does not metabolize adequate insulin, which induces DM incompetent use of insulin. The disadvantage of the condition consists of very mild symptoms leading to delayed diagnosis and other serious diseases, like kidney, blindness and nervous system health risks. Diabetes can be categorized into four subgroups: (a) Type 1 diabetes; (b) Type 2 diabetes (referred to Type 2 Diabetes Mellitus –T2DM); (c) Prediabetes and (d) Gestational Diabetes [1]. T2DM affects quality of life and longevity with other chronic diseases such as hypertension, obesity, dyslipidemia, arterial colitis, and angina. Both short-term and long-term adverse effects associated with T2DM are well known in patients at heart risk. Therefore, early diagnosis and prevention of T2DM are very important in preventing many serious and life-threatening complications in patients with cardiovascular disease. Recent studies have shown that improving lifestyle and medication interventions may prevent diabetes complications and prevent the onset of T2DM. Therefore, it is very important to identify those at high risk for T2DM to establish prevention strategies for T2DM [2].

In 2015, nearly 415 million people have been affected by this disease, and this number may increase in years to come, according to several reports issued by WHA (World Health Association). Around one out of 10 adults are expected to have diabetes. To reduce this, it is essential that diabetes be detected at an early stage and prevented. A high risk community of T2DM should be demonized and an extreme evaluation carried out in order to minimize the impact of DM. The various implications of T2DM are shown in Fig. 1.

The following are interpretations of specific groups of DM according to World Health Organization (WHO) standards.

  • People belonging to the age group of 45 or greater

  • Seldom exercising

  • Greater Body Mass Index (BMI) (i.e. 24 Kg/m2)

  • Impaired Glucose Tolerance (IGT) / Impaired Fasting Tolerance (IFT)

  • Inheritance of DM

  • Hypertriglyceridemia (HTG)

  • Hypertension / Cardiac diseases

  • Age of pregnant ladies is 30 or greater

Advanced technology must be used for further application. Data mining, also known as DKD–Knowledge Discovery in Database, [3] is a suitable technology for this study. Generally, the data mining is used to discover unique patterns concealed in large datasets using different techniques relating to sophisticated intelligent techniques [4], using balanced or unbalanced values. For pattern recognition, estimation, clustering and correlation several approaches are applied. The reliability of data and the implementation of proper methodology are the most essential considerations of data mining. From the beginning, data mining was extended to different domains to derive useful data from a huge number of datasets. Every domain has its own complexity, but the field of early disease forecasting still has potential for enhancement. The diagnostic information necessary is available from the patient database kept in each hospital [5]. The building of an effective model for the early forecast and avoidance of this disease must still lead to an improved performance.

In this paper, a design that integrates best PSO functionality into the traditional FCM system, which changes the weighting of each cluster enrollment, is proposed. Each point in the whole data set has a distinctive weight in contradistinction to every cluster with regard to the conceptual model. For noisy data this particular weight plays an important role in effective clustering. The flow of the article comprises the following: Section 2 presents various works related to diabetics’ prediction on various sets of data. Section 3 explains the information related to dataset chosen for analysis and details about the proposed prediction methodology. Section 4 discusses the PSO-FCM based data mining model to predict diabetic disease. Section 5 comprehends the performance evaluation and its comparison with various methods. The conclusion of the article is presented in Section 6 [6].

Section snippets

Literature survey

Many scientists have performed different methodologies and techniques to prevent the disease at an early stage and take the necessary measures to lessen its impact on the specific group of people. A few exceptional works have been mentioned in this section. In the Ministry of National Guard Health Affairs (MNGHA) datasets, Saudi Arabia, Daghistani and Alshammari, [7] explored the numerous classification strategies in their application. The supervision and unchecked learning techniques with

Dataset description

The dataset is the prerequisite to accomplish the data mining process. Throughout the ages, many scientists have performed an assessment with various kinds of datasets with existing datasets and/or data gathered by many service providers, such as hospitals and medical organizations. So far, a number of datasets of diabetes for different emphasis regions are readily accessible. The Pima Indians Diabetes Database (PIDD) is one of these most prevalent datasets for prediction and is used for

Data mining techniques to predict diabetic disease

The generic FCM algorithm approach is highly sensitive to noisy data. The biggest disadvantage is the consequence of the overall effectiveness of the issue. A revision of the FCM specification together with the input that is supplied as the preprocessed information is necessary in order to solve this problem. This paper suggests an effective system integrating the best attributes of the PSO into the conventional FCM process, which improves the cluster's weighting of the participants. Every

Confusion matrix

The accuracy concern is measured by the use of a confusion matrix composed of the concepts "True Positive" (TP), "False Positive" (FP), "False Negative" (FN) and "True Negative" (TN). Table 4 represents the basic confusion matrix between the actual and the predicted values.

Apart from accuracy, there are few parameters that impact the performance prediction and are listed below with their expressions.

Sensitivity (SN) refers to the True Positive rate (TP) and is expressed as,SN=TPTP+FN*100%

Conclusion

In this paper, the finding in PSO-FCM is portrayed in an efficient model that comprises the best attributes of PSO and FCM. A comprehensive methodology, like PSO and FCM, has now been established after a detailed study of the works created previously. Firstly the parameters of pbest and gbest are observed and then the tests of effective clustering are added to FCM. While FCM is a successful clustering strategy, the precision of the system is dropped due to preterm integration. The popular PSO

Funding information

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

The authors declare that they have no conflict of interest.

References (25)

  • O.T. Kavakiotis et al.

    Machine learning and data mining methods in diabetes research

    Comput. Struct. Biotechnol. J.

    (2017)
  • I. Rahul Joshi et al.

    Analysis and prediction of diabetes diseases using machine learning algorithm: ensemble approach

    Int. Res. J. Eng. Technol.

    (2017)
  • Cited by (34)

    • Prediction of diabetes using voting classification algorithms

      2024, Medical Imaging Informatics: Machine learning, deep learning and big data analytics
    View all citing articles on Scopus
    View full text