1 Introduction

Health maintenance and improvement are the key to living a healthy life [20, 21, 38, 41, 49], but the outbreak of COVID-19 has become the biggest threat to human existence. COVID-19 is a fatal widespread disease instigated by a recently discovered COVID-19Footnote 1. This disease occurred at the end of 2019 in the Wuhan region of China. This revised version of Covid-19 is produced by a new adherent of the coronavirus family. The findings show that Covid-19 is spread from person to person that causes serious respiratory problems among the affected ones [5, 29, 37]. It has been admitted a plague by the World Health Organization (WHO). Covid-19 is currently evolving global challenges, and like other pandemics, it weakens the health system and poses a substantial risk to the global economy. The Covid-19 has affected the world economy and society [16, 45, 58].

The Establishment and consultants of China alert an outbreak of an unknown form of pneumonia in China’s cities (i.e., Wuhan and Hubei) to the WHO on December 31, 2019. A novel rinsing of COVID-19 was consequently quarantined from the patient on January 7, 2020. The ultimate source from where the virus spread is unknown. WHO put forward the possible continual human-to-human transmission on \(21^{st}\) January 2020 [36]. In the beginning, COVID-19 was spreading only in different regions of China. However, then it starts to spread in different associated countries of China. When this virus starts spreading, there were 600 cases confirmed in China [36] and now more than 424,000 people are infected globally. Several people who globally died because of this virus have been mounted from 18,900Footnote 2. WHO determined the most common symptoms of this virus are tiredness, fever, and dry cough. The persons with these mild symptoms can be recovered without any necessity of special treatment and medications. However, some patients came forward with more symptoms: runny nose, sore throat, nasal congestion, aches, pain, or diarrhea. Typically, 80% of people who get infected with COVID-19 have mild symptoms of coldFootnote 3.

The effective strategy for limiting transmission of the virus is self-quarantined (or self-isolation) following the emergence of symptoms [14]. The National Health Service (NHS) concluded some cases with symptoms, i.e., high fever, continuous cough. This is a form of viral pneumonia, so antibiotics are not treating patients well. NHS suggests anyone with these kinds of symptoms should self-isolate themselves for 7 to 14 daysFootnote 4. The main contributions of this paper are:

  • We present a study of the increasing effect of the COVID-19 pandemic.

  • The death rate and risk level of COVID-19 can be minimized if detected at an early stage. Therefore, we propose an ANFIS based predictive model for predicting the risk level of COVID-19.

  • The COVID-19 dataset is analyzed and classified based on the consultants’ latest suggestions and the current situation.

  • This paper provides the classification results based on parameters for predicting the risk factors of Covid-19 using ANFIS.

  • The machine learning classifiers are also implemented and the best classifier for this dataset is selected based on a comparative analysis of machine learning classifiers.

  • Results show that the proposed system effectively recognizes COVID-19 individuals and predicts the risk factor of Covid-19.

The rest of the paper is organized as follows. In Sect.  2, the recent work related to COVID-19 is covered. Section  3 provides the proposed system for the prediction of COVID-19 using classification models described further. The evaluation and experimental results are discussed in Sect.  4, along with a comparative analysis of the classification algorithms. Finally, the paper is concluded in Sect. 5.

2 Literature review

According to the worldwide pandemic situation 2020, COVID-19 is spreading globally. A large number of people have been affected by this virusFootnote 5. A good number of researchers have predicted the type of algorithm to combat this virus. In [19] the classifier SVM and mutual information (MI) techniques were applied for data classification of genes. The authors claimed that the SVM classifier accomplished the best mean accuracy rate. Furthermore, authors in [8] used the fuzzy KNN approach on the dataset of Parkinson’s disease and generated a diagnostic system that makes better decisions in clinical diagnosis. A statistical learning model was established in 2020 to help doctors forecast patients with Covid-19 for respiratory failure that requires mechanical ventilation. The accuracy of 84% was predicted from moderate to severe respiratory failure [12]. Authors in [26] used Naïve Bayes classifies to improve the accuracy of predicting heart disease risk. Different machine learning techniques [2], i.e., Artificial Neural Network (ANN), Random Forest (RF), and K-means clustering techniques were implemented for the prediction of diabetes. The ANN technique provides the best accuracy rate of 75.7% in the prediction of diabetes that helps the experts in the diagnosis of diabetes.

In [23], a small amount of data from various hospitals was collected and trained using deep learning models and block-chain-federating learning. The proposed solution detects the pattern of Covid-19 using CT-imaging. The trained model provides the best accurate prediction. Similarly, the authors in [44, 54] used blockchain for a patient-centric framework for Blockchain-enabled healthcare applications. In [52] some researchers also implanted machine learning techniques for predicting hypertension outcomes based on medical data. In [9], the author used four classification algorithms (SVM, DT, RF, and XGBoost) to meet the system’s accuracy level. XGBoost produces the best results among the four classifiers and provides a system accuracy of 94.36% [9, 24]. In [18], the authors implanted an ANFIS model to estimate landslide susceptibility. They implemented this model for the training and validation of the dataset. The predictive model ANFIS model is presented to predict landslides, so the individual can implement this model in different land sliding circumstances [18]. In 2017, the author proposed a system based on SVM and fuzzy to block pornographic contents on the web. The proposed system automatically blocks and detects the adult contents for parent’s convenience [3]. SVM was also used in the statistical learning approach. This type of learning approach implements SVM in a case study where it classifies the hypothesis test data and computes the error rate by using the Gaussian-density function [1, 13]. The sentimental analysis of Twitter data related to the progress of Covid-19 was perceived in 2020. The tweets were classified using machine learning classification methods. Classification accuracy of 91% was observed [40].

Quality of Service (QoS) is an essential factor for the service of cloud computing. The QoS data contains, by default, non-linear property, so it is difficult to build a QoS data prediction model. In [28] the researchers implemented an intelligent technique ANN and proposed a novel QoS prediction approach that presents experimental results on the large scale of QoS service data and guarantees the sustainability of the system. Fuzzy is used for security purposes in mobile computing and cloud computing. Authors in [34] trained ANFIS to predict human brain activity so that it can be used for real cases [42]. Authors in [17, 22, 43] also focused on enhancing the privacy of the individuals’ medical information. Backtracking Search algorithm (BSA) and ANFIS model are used for simulating the Ontario electricity price accurately. The simulation results have been compared for analyzing the best-optimized model between ANN and ANFIS [32].

Authors in [25] implemented a linear Kernel SVM for classification and prediction of social networking data. The accuracy results for the social Internet of Things (IoT) prediction model were from 80 to 90%. The hybrid proposed model was established in [27] using deep learning and classical machine learning for mask detection. SVM, DTs, and other collections of machine learning algorithms were selected for the investigation. SVM achieved the highest accuracy of 99.64% among the other algorithms. Authors in [25] proposed a medical expert system to detect heart-related problems. In this system, electrocardiography (ECG) signals are used for data preprocessing, and algorithms like SVM and other classifiers are handled in removing noise and extracting HRV features [50]. Authors in [53] used the ANFIS model in his proposed work for Cooperative Localization (CL) on the dataset verified by lake trials. The Fuzzy SVM was also implemented for facial emotion recognition in [57]. The authors proposed an expert system in 2019 to diagnose heart disease based on various parameters.

3 Proposed system

The proposed model shows the classification and identification of parameters of COVID-19 for early detection of COVID-19, with the help of machine learning classifiers and ANFIS. First, the dataset is classified and compared using DT, KNN, and SVM classifiers. Then, the ANFIS predictive model is trained to predict this COVID-19 risk. Figure 1 shows the flow of the proposed system.

Fig. 1
figure 1

Proposed System for COVID-19 Risk Prediction

3.1 Dataset collection

We use the COVID-19 dataset published on KaggleFootnote 6. This dataset contains five attributes that indicate the number of confirmed cases, recovered and death cases infected with the virus. These attributes are applied for the classification and identification of parameters of COVID-19. The dataset collected is trained using classifiers to categorize the patients that died from this virus and the patients recovered from the virus. The dataset contains 1001 cities belonging to three of the attributes confirmed, recovered, and death. The proposed system is for patients recovered from this virus. The risk factor of the globally spread disease is predicted from the ANFIS model. The dataset contains five attributes that classify the data between two classes in ’0’ and ’1’. 0 represents the ’death cases’ by a province/ state, and 1 represents the ’recovered cases’ of this fatal virus.

Table 1 Attributes of dataset

Table 1 shows the attributes of the dataset description. The dataset contains the total number of states where COVID-19 spread in the human population and the total number of confirmed cases, death cases, and recovered cases in these states collectively.

3.2 Data preprocessing

The COVID-19 dataset contains many missing values; for eliminating the missing values, the interpolation method is used. The missing values are filled with the mean, median, or mode values of the respective feature. The dataset also consists of duplicate values. We remove these duplicate values for the best results from all attributes.

Table 2 Dataset description of COVID-19

Table 2 shows the dataset containing 1001 instances of COVID-19. Furthermore, the feature extraction phase is implemented on the dataset. Feature extraction converts raw data into numerical features. The best features from the dataset are extracted based on histogram graphs. The features ’death cases’ and ’recovered cases’ have the highest probability of data in the COVID-19 dataset.

3.3 Machine learning models

This section presents the machine learning models used for risk prediction.

Decision Tree (DT): It is a supervised machine learning technique that splits the dataset into two or more classes to solve the classification [7]. DT represents a tree with internal nodes that denotes a test of an attribute, each branch represents an outcome of the test, and each of the leaf nodes holds the class label. DT can be trained on both continuous and binary variables. There are different kinds of DT graphs, linear DT, medium DT, and complex DT. The dataset is classified using all these DT classifiers.

K-Nearest Neighbour (KNN): is used to train the dataset and classify the dataset based on similarity and distance measures. KNN points with the distance metrics and several nearest neighbors [55]. In this paper, the nearest neighbors are determined based on Euclidean Distance (ED) shown in Eq. (1).

$$\begin{aligned} \mathrm{Euclidean Distance} (d) = \sqrt{}\sum \limits ^m_{i=1}{{(x_1-y_1)}^2} \end{aligned}$$
(1)

KNN is further divided into six kinds: fine KNN, medium KNN, coarse KNN, cosine KNN, cubic KNN, and weighted KNN.

Support Vector Machine (SVM): It is a supervised learning approach that processes and classifies nonlinear, high-dimensional, and unbalanced data. SVM algorithm process risk minimization [11]. SVM is good to be trained on a large dataset [46, 47]. Data are classified by using different types of SVM classifiers. The COVID-19 dataset contains values less than 1000 and some extreme values greater than 4000. In a SVM classifier [56], let the training set be \({(x1, y1), (x2, y2) \dots (x_n, y_n)}\), where xi is an input vector and \(y_i\) its label. The partition hyperplane can be defined as

$$\begin{aligned} \omega . x + b=0 \end{aligned}$$
(2)

In Eq. (2), b is the offset of the hyperplane; \(\omega \) is the normal vector of the partition hyperplane. The Eq.  (3) is shown below

$$\begin{aligned} Minimize\emptyset \left( \omega \right) =\frac{1}{2}{|\left| \omega \right| |}^2 \end{aligned}$$
(3)

The Lagrange function can be defined in Eq. (4) :

$$\begin{aligned} L\left( \omega ,b,\alpha \right) =\frac{1}{2}\left( \omega .\omega \right) -\sum \limits ^{\varvec{n}}_{\varvec{i}\varvec{=}\varvec{1}}{{\alpha }_i(y_i\left( \omega \bullet x_i+b\right) -1)} \end{aligned}$$
(4)

For hyperplane, dataset D is the set of n couples of elements (\(x_i\) ,\(y_i\)) shown below in Eq. (5).

$$\begin{aligned} D= \{(x{}_{i},y{}_{i}) {\mid } x{}_{i\ }{\in } R{}^{p\ },y{}_{i\ }{\in } \{\mathrm {-}1,1\}\} \end{aligned}$$
(5)

SVM is divided into different types, linear SVM, quadratic SVM, cubic SVM, fine Gaussian SVM, medium Gaussian SVM, and coarse Gaussian SVM.

Adaptive Neuro-Fuzzy Inference System (ANFIS): ANN gives a linear model based on fuzzy rules and expert systems close to human-like expert system [15]. Whereas ANFIS is a combinational model of FIS and ANN [33]. As ANFIS is a hybrid system, so its learning ability is more efficient than FIS models. It creates a valuable competency relationship between input and output [10]. The nodes in the same layer of the architecture perform the same functionality. Thus, the ANFIS implements on the collected dataset to generate a predictive linear expert model to compute the risk prediction level of COVID-19. In this paper, the ANFIS model is used because its learning ability is more efficient than the FIS model [4, 35]. It creates a valuable competency relationship between input and output. The descriptions of the ANFIS layers are as follows:

Layer 1: helps in generating membership functions for each of the nodes. If x is sent as an input, it generates a membership function as \(\mu \)A(x). Here, A represents the linguistic label (low, medium, high) that associates with the function of each node shown in Eq. (6).

$$\begin{aligned} O{}_{i}{}^{1} = \{\mu A(x) i\}=1, 2. \end{aligned}$$
(6)

Layer 2: Every node in layer two is represented with a circle. This layer multiplies signals that it receives and sends the product as an output shown in Eq. (7).

$$\begin{aligned} W_{i} = \{\mu A(x)\}x\{ \mu B(y) \} \quad i=1,2,\dots ,n \end{aligned}$$
(7)

The output that it gives is the firing strength of the rule.

Layer 3: In this layer, the nodes are depicted by a circle shape with label N. Here, the ith node calculates the ratio of the firing strength of the \(i{th}\) rule to the sum of firing strength of the rules in Eq. (8).

$$\begin{aligned} W'= \frac{wi}{w1+w2} \end{aligned}$$
(8)

The output of this layer is called the normalized firing strength.

Layer 4: This layer multiplies the output generated by Layer 3 with the Sugeno Model’s output.

$$\begin{aligned} O{}_{i}{}^{4}= w{}_{i}{}^{'}f{}_{i} = w{}_{i}{}^{'\ }(pix + q{}_{i}y + r{}_{i}) \end{aligned}$$
(9)

In Eq. (9) p, q, r represents the parameter set. The parameters in this layer are known as consequent parameters.

Layer 5: This layer is known as the final layer. It provides summation of all signals that it receives. It is represented by a circle node with the label \(\sum \) shown in Eq.  (10)

$$\begin{aligned} O{}_{1}{}^{5} = OUTPUT= \sum _{_i}w{}^{'}f \end{aligned}$$
(10)

The dataset is passed through all these layers of ANFIS. This helps the model in giving the most accurate risk prediction of this disease.

4 Evaluation and results

Results are evaluated using the performance measures, where the test data were evaluated using the K-fold cross-validation method. This method computes the accuracy using the number of observations and k-fold validation. It also makes predictions on the input data according to the number of validation folds. For this data, the number of validation folds is 5. The suitable classifier for the dataset is selected based on the Performance Measures.

Table 3 Classification performance using accuracy measure

Table 3 presents the performance measures: accuracy, sensitivity, specificity, and f-measures.

4.1 Machine learning for COVID-19

According to the result, the evaluation of DT classifiers is shown in Table 4 where all classifiers have the same specificity of 13.78% because their true and negative values are the same. At the same time, the performance comparison is based on performance measure sensitivity. Sensitivity computes the completeness level of the classifier, so the sensitivity of all DT classifiers is 96.00%. Other accuracy measures, precision, and F-measure are also 96.00% for all DT classifiers because of the same TN, FP, FN, and TP values.

Table 4 Comparison of DT performance measures (%)

Figure 2 shows the confusion matrix of DT representing the TN, FP, FN, and TP values of the current classifier. Roc curves show the true and false-positive rates for the currently selected, trained classifier. Figure 3 shows one negative class and one area means 100% of the ROC graph is under the curve. ROC curve for the complex DT is shown in Fig. 3.

Fig. 2
figure 2

Confusion matrix of complex DT classifier

Fig. 3
figure 3

ROC Curve for complex DT

KNN is further divided into six origins, i.e., fine, medium, coarse, cosine, cubic, and weighted. Table (5) shows the positive and negative values of all types of KNN.

Table 5 True and negative values of KNN

As a result, the coarse KNN achieved the highest specificity measure. The coarse KNN achieved 57.33% specificity of the dataset shown in Table 6. The fine KNN achieved the highest 96.52% completeness of the dataset among all KNN classifiers measured through specificity. The medium KNN shows the highest precision measurement of 96.52%, and the highest accuracy level of predicted instances is measured through the fine KNN shown below. Fine KNN achieved the highest F-measure that represents the weighted average of precision and sensitivity of the dataset. Based on all KNN classifiers’ performance comparisons, fine KNN achieved the highest accuracy among all KNN classifiers. Therefore, the fine KNN classifier is selected for the best optimized KNN model.

Table 6 Comparison of KNN performance measures (%)
Fig. 4
figure 4

Confusion matrix of fine KNN classifier

Fig. 5
figure 5

ROC curve for fine KNN classifier

Figure 4 shows the confusion matrix of fine KNN representing the TN, FP, FN, and TP values of the fine KNN Classifier. Roc curves show the true and false-positive rates of the fine KNN Classifier. Figure 5 shows that there is 1 negative class and the 0.915914 area of the ROC graph is under the curve of the positive predictive class.

SVM also divides further, i.e., linear, quadratic, cubic, fine Gaussian, medium Gaussian, and coarse Gaussian [6]. Table 7 shows the TN, FP, FN, and TP values for SVM Classifier.

Table 7 True and negative values of SVM

Fine Gaussian SVM achieved the highest specificity of the dataset among all subdivided SVM classifiers that 37.33%. Completeness of the dataset is measured to specificity, that is, 98.06% as shown in Table 8. Precision measures the accuracy of the dataset, and fine Gaussian SVM results in 93.48% precision. The cubic SVM computes the highest weighted average through F-Measure, which is 94.78%, while linear SVM achieves the highest accuracy of 100%. The linear SVM classifier is the most appropriate and optimized SVM model for the COVID-19 dataset based on the best accuracy.

Table 8 Comparison of SVM performance measures
Fig. 6
figure 6

Confusion matrix of linear SVM classifier

Fig. 7
figure 7

ROC Curve for linear SVM classifier

Figure 6 shows the confusion matrix of SVM with the total number of observations made by the linear SVM Classifier in each cell that represents through TN, FP, FN, and TP values of the classifier. ROC curves show the true and false-positive rates for the currently selected, trained classifier. Figure 7 shows one negative class, and 1 area means 100% of the ROC graph is under the positive predictive class curve. Therefore, linear SVM predicted the 100% values positively on the COVID dataset. The linear SVM achieved the best 100% results in the classification dataset. Furthermore, the risk prediction level is determined according to the data classified by the classifiers.

4.2 ANFIS for COVID-19

With the help of SVM, the correctly predicted values separate from the dataset. These positive values are used in the generation of input parameters of the COVID-19 dataset for ANFIS. After seeing the recovered classified cases of COVID-19, a new dataset is generated for the COVID-19 risk predictive model. The data comprises inputs that are the COVID-19 parameters, i.e., temperature (low, high, medium), cough (low, high, medium), shortness of breath (low, high, medium), age (low, high, medium), Immunity (low, high, medium). These parameters and datasets are generated with help from different websites and expert advice. The output parameter comprises risk prediction (low, medium, high). The collected input parameters are based on the symptoms of COVID-19 specified by the consultants.

Table 9 Linguistic labels for fuzzy variables

Table 9 shows that the input parameters are assigned with linguistic variables and specified ranges.

Table 10 Input data collection

Table 10 comprises the input data used for making rules and further preprocessing. The data values of cough, shortness of breath, and Immunity are assumed in the form of a percentage (i.e., 0.3x100=30%). The sample data spaces consist of 300 instances of data. About 70% of the sample data is used for training and 30% is used for testing. Sugeno FIS model always computes predictions in the form of numeric data [39].

Fig. 8
figure 8

Sugeno FIS model

Figure 8 represents the proposed Sugeno FIS model for COVID-19 risk prediction that describes temperature, cough, Immunity, shortness of breath, the adage took as input parameters and their linkage with the ANFIS Sugeno model [59] and generated rules for finding the risk prediction, while Fig. 9 represents the proposed ANFIS predictive model. The research paper’s predictive model is shown by loading the input parameters of COVID-19 to input the variables, using the applicable rules for the defuzzification of data to find the risk prediction as an output.

The steps of the fuzzy inference system for calculating the risk prediction are given below:

  1. 1.

    Identifying the input parameter that helps in the estimation of the disease.

  2. 2.

    Load the data values of the input parameters.

  3. 3.

    The parameters are assigned to linguistic variables.

  4. 4.

    Assigning ranges of the variables and plot their membership functions.

  5. 5.

    Knowledge base containing information base and control rule base.

  6. 6.

    Generating rules according to the input parameters that affect the system.

  7. 7.

    Graphical representation of the rules.

  8. 8.

    Aggregation of generated random rules output.

  9. 9.

    Defuzzification of the interface.

  10. 10.

    Surface Viewer of the input and output parameters.

  11. 11.

    Train and test data.

  12. 12.

    Generate ANFIS structure model.

Fig. 9
figure 9

ANFIS predictive model

The proposed system implements all these steps and predicts the risk level of the people affected with COVID-19. Training data is loaded for the training of the Sugeno-based ANFIS risk prediction model. Almost 70% of the whole data is loaded into MATLAB.

Generating ANFIS: Next, we implement the ANFIS of the selected Sugeno model, after defining inputs, parameters, and output variables [48]. The ANFIS model’s structure consists of input parameters, membership functions of input, and fuzzy rules that are the fuzzy logic’s backbone. The Sugeno model is developed in a fuzzy inference system by taking temperature, cough, immersion, shortness of breath, and age as inputs, and risk prediction is selected as the output as shown in Fig. 10.

Fig. 10
figure 10

Sugeno model showing input and output

In fuzzy, a fuzzy set’s membership function summarizes the indicator function for the sets’ classification. It represents the degree of truth of the addition of the evaluation. We select each input and define the membership function for each parameter. Compared to Mamdani FIS, the Sugeno membership parameters select automatically. The membership functions are defined, the type of input membership functions, and the type of output membership functions.

In Fig. 11, three membership functions are estimated for the suitable ranges of input values (low, medium, and high) of the COVID risk prediction. Each of the parameters defines three membership functions (low, medium, and high) to predict the risk factor [30]. For each parameter, the ranges are defined for low, medium, and high as their membership plot [51].

The membership function helps in the prediction of risk define within specific ranges.

Fig. 11
figure 11

Membership function of temperature associating inputs with outputs

After defining the membership ranges, the function rules are defined based on the if-then rule if the risk is detected. There are 215 rules in the rule editor. The output of each rule generated combines four input variables and three membership functions. Rule sets are illustrated below.

  • IF (age is low) and (temperature is low) and (cough is low) and (shortness_of_breath is low) and (Immunity is low) THEN (risk_prediction is high)

  • IF (age is low) and (temperature is low) and (cough is low) and (shortness_of_breath is low) and (Immunity is medium) THEN (risk_prediction is medium)

  • IF (age is medium) and (temperature is low) and (cough is medium) and (shortness_of_breath is low) and (Immunity is high) THEN (risk_prediction is low)

  • IF (age is medium) and (temperature is low) and (cough is medium) and (shortness_of_breath is low) and (Immunity is high) THEN (risk_prediction is low)

  • IF (age is high) and (temperature is medium) and (cough is high) and (shortness_of_breath is medium) and (Immunity is medium) THEN (risk_prediction is high)

The rules are randomly generated based on the symptoms that detect the disease, i.e., the person whose age is below 11 or above 70 has low Immunity; low Immunity leads to a higher risk of virus infection. Sugar cancer heart patients also need strict precautions because they have a low immune system. Fuzzy IF/THEN rules with variations in output are shown in Tables 11,12 and 13. The rules are made for each of the five input parameters with their 3 membership functions to the power 3 equals 125 rules generated in the FIS.

The rules are generated in the Fuzzy Inference. The rule viewer predicts the shape of membership functions that effects the final results. The rule viewer is shown in Fig. 12.

Fig. 12
figure 12

Fuzzy rule base of risk predictor

In Tables 11,12 and 13 the membership function (low, medium and high) is shown for IF/THEN rules for input and output parameters.

Table 11 Fuzzy if/then rules when output is low
Table 12 Fuzzy if/then rules when output is medium
Table 13 Fuzzy if/then rules when output is high

For training and testing of data, 70% of the data is used for training data while 30% is used for testing [31]. The given training data of the risk prediction is shown in Fig. 13 while the error tolerance for the training of data is 0.0014794.

Fig. 13
figure 13

Training data of proposed solution

The 30%–35% of the dataset is a load for testing. The proposed solution’s average testing error is 4.155, shown in Fig.  14. The testing is done by loading the file to test FIS. Figure 15 shows the surface viewer of the output. The training data overlaps with the testing data to check if the possible values are correct. The overlapping data shows the correctness of the following procedure.

Fig. 14
figure 14

Testing of proposed solution

Fig. 15
figure 15

Surface viewer of risk test

Figure 16 represents the ANFIS structure after training and testing the data.

Fig. 16
figure 16

ANFIS structure of risk prediction

4.3 Comparative analysis

The comparative analysis of the classification algorithm is shown in Table 14. Table 14 shows the accuracy measure of each classifier. Comparing these measures concludes that SVM achieved the highest accuracy of 100% compared to the DT and KNN for the COVID-19 dataset. SVM achieved the completeness level of this dataset at 100%. Accuracy measure by precision is also 100%. This shows that the SVM 100% accurately classifies the dataset compared to other classifiers. The Table shows each classifier’s best origin’s Performance Measures, i.e., linear SVM, fine KNN, and complex DT. SVM is the best classifier for the COVID-19 dataset that achieved the best accuracy level for classification. The proposed model reaches high prediction and classification accuracy with classification techniques (DT, KNN, SVM).

Table 14 Comparison of classification algorithms

5 Conclusion

COVID-19 is a global health threat and virus that can infect a person through respiratory droplets formed from the infected person’s body. This increasing number of death rates can also affect the countries’ economy and set up a pandemic situation. In this paper, different machine learning classification algorithms such as DT, KNN, and SVM are tested on COVID data and comparatively analyzed based on their training data Performance Measures. ANFIS is used to model and control ill-defined and uncertain systems to predict this globally spread disease’s risk factor. COVID-19 dataset is classified using Support Vector Machine (SVM) because it achieved the highest accuracy of 100% among all classifiers. Furthermore, the ANFIS model is implemented on this classified dataset, which results in an 80% risk prediction for COVID-19. In the future, we shall apply the algorithm to the new variant of COVID-19 data seen in other parts of the world.