PSO-FCM based data mining model to predict diabetic disease
Introduction
Individuals in the universe today suffer from many diseases, the most pervasive being diabetes. The efficient factors leading to the rise of this disease include dietary changes, lack of exercise, stress boost based on different concerns and obesity. The pancreas does not metabolize adequate insulin, which induces DM incompetent use of insulin. The disadvantage of the condition consists of very mild symptoms leading to delayed diagnosis and other serious diseases, like kidney, blindness and nervous system health risks. Diabetes can be categorized into four subgroups: (a) Type 1 diabetes; (b) Type 2 diabetes (referred to Type 2 Diabetes Mellitus –T2DM); (c) Prediabetes and (d) Gestational Diabetes [1]. T2DM affects quality of life and longevity with other chronic diseases such as hypertension, obesity, dyslipidemia, arterial colitis, and angina. Both short-term and long-term adverse effects associated with T2DM are well known in patients at heart risk. Therefore, early diagnosis and prevention of T2DM are very important in preventing many serious and life-threatening complications in patients with cardiovascular disease. Recent studies have shown that improving lifestyle and medication interventions may prevent diabetes complications and prevent the onset of T2DM. Therefore, it is very important to identify those at high risk for T2DM to establish prevention strategies for T2DM [2].
In 2015, nearly 415 million people have been affected by this disease, and this number may increase in years to come, according to several reports issued by WHA (World Health Association). Around one out of 10 adults are expected to have diabetes. To reduce this, it is essential that diabetes be detected at an early stage and prevented. A high risk community of T2DM should be demonized and an extreme evaluation carried out in order to minimize the impact of DM. The various implications of T2DM are shown in Fig. 1.
The following are interpretations of specific groups of DM according to World Health Organization (WHO) standards.
- •
People belonging to the age group of 45 or greater
- •
Seldom exercising
- •
Greater Body Mass Index (BMI) (i.e. 24 Kg/m2)
- •
Impaired Glucose Tolerance (IGT) / Impaired Fasting Tolerance (IFT)
- •
Inheritance of DM
- •
Hypertriglyceridemia (HTG)
- •
Hypertension / Cardiac diseases
- •
Age of pregnant ladies is 30 or greater
Advanced technology must be used for further application. Data mining, also known as DKD–Knowledge Discovery in Database, [3] is a suitable technology for this study. Generally, the data mining is used to discover unique patterns concealed in large datasets using different techniques relating to sophisticated intelligent techniques [4], using balanced or unbalanced values. For pattern recognition, estimation, clustering and correlation several approaches are applied. The reliability of data and the implementation of proper methodology are the most essential considerations of data mining. From the beginning, data mining was extended to different domains to derive useful data from a huge number of datasets. Every domain has its own complexity, but the field of early disease forecasting still has potential for enhancement. The diagnostic information necessary is available from the patient database kept in each hospital [5]. The building of an effective model for the early forecast and avoidance of this disease must still lead to an improved performance.
In this paper, a design that integrates best PSO functionality into the traditional FCM system, which changes the weighting of each cluster enrollment, is proposed. Each point in the whole data set has a distinctive weight in contradistinction to every cluster with regard to the conceptual model. For noisy data this particular weight plays an important role in effective clustering. The flow of the article comprises the following: Section 2 presents various works related to diabetics’ prediction on various sets of data. Section 3 explains the information related to dataset chosen for analysis and details about the proposed prediction methodology. Section 4 discusses the PSO-FCM based data mining model to predict diabetic disease. Section 5 comprehends the performance evaluation and its comparison with various methods. The conclusion of the article is presented in Section 6 [6].
Section snippets
Literature survey
Many scientists have performed different methodologies and techniques to prevent the disease at an early stage and take the necessary measures to lessen its impact on the specific group of people. A few exceptional works have been mentioned in this section. In the Ministry of National Guard Health Affairs (MNGHA) datasets, Saudi Arabia, Daghistani and Alshammari, [7] explored the numerous classification strategies in their application. The supervision and unchecked learning techniques with
Dataset description
The dataset is the prerequisite to accomplish the data mining process. Throughout the ages, many scientists have performed an assessment with various kinds of datasets with existing datasets and/or data gathered by many service providers, such as hospitals and medical organizations. So far, a number of datasets of diabetes for different emphasis regions are readily accessible. The Pima Indians Diabetes Database (PIDD) is one of these most prevalent datasets for prediction and is used for
Data mining techniques to predict diabetic disease
The generic FCM algorithm approach is highly sensitive to noisy data. The biggest disadvantage is the consequence of the overall effectiveness of the issue. A revision of the FCM specification together with the input that is supplied as the preprocessed information is necessary in order to solve this problem. This paper suggests an effective system integrating the best attributes of the PSO into the conventional FCM process, which improves the cluster's weighting of the participants. Every
Confusion matrix
The accuracy concern is measured by the use of a confusion matrix composed of the concepts "True Positive" (TP), "False Positive" (FP), "False Negative" (FN) and "True Negative" (TN). Table 4 represents the basic confusion matrix between the actual and the predicted values.
Apart from accuracy, there are few parameters that impact the performance prediction and are listed below with their expressions.
Sensitivity (SN) refers to the True Positive rate (TP) and is expressed as,
Conclusion
In this paper, the finding in PSO-FCM is portrayed in an efficient model that comprises the best attributes of PSO and FCM. A comprehensive methodology, like PSO and FCM, has now been established after a detailed study of the works created previously. Firstly the parameters of pbest and gbest are observed and then the tests of effective clustering are added to FCM. While FCM is a successful clustering strategy, the precision of the system is dropped due to preterm integration. The popular PSO
Funding information
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Competing Interest
The authors declare that they have no conflict of interest.
References (25)
- et al.
Type 2 diabetes mellitus prediction model based on data mining
Informat. Med. Unlocked
(2018) - et al.
Accuracy improvement for diabetes disease classification: a case on a public medical dataset
Fuzzy Inf. Eng.
(2017) - et al.
Prediction of diabetes using classification algorithms
Procedia Comput. Sci.
(2018) - et al.
Hybrid prediction model for Type-2 diabetic patients
Expert Syst. Appl.
(2010) Everything You Need to Know about Diabetes
Healthline
(2018)- et al.
Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks
Yonsei Med. J.
(2019) - "Data mining", En.wikipedia.org, 2020. [Online]. Available: http://en.wikipedia.org/wiki/Data_mining#cite_note-acm-1....
- et al.
Post-diagnosis management of diabetes through a mobile health consultation application
- Decoderz, “A novel numerical optimization algorithm inspired from particles: particle swarm, optimization”, Transpire...
- et al.
Diagnosis of diabetes by applying data mining classification techniques
Int. J. Adv. Comput. Sci. Appl.
(2016)
Machine learning and data mining methods in diabetes research
Comput. Struct. Biotechnol. J.
Analysis and prediction of diabetes diseases using machine learning algorithm: ensemble approach
Int. Res. J. Eng. Technol.
Cited by (34)
Prediction of diabetes using voting classification algorithms
2024, Medical Imaging Informatics: Machine learning, deep learning and big data analyticsPrediction of diabetes disease using an ensemble of machine learning multi-classifier models
2023, BMC BioinformaticsA diabetes prediction model based on Boruta feature selection and ensemble learning
2023, BMC BioinformaticsEntropy-Based Fuzzy C-Ordered-Means Clustering Algorithm
2023, New Generation Computing