1 Introduction

The coronavirus disease (COVID-19) pandemic emerged in Wuhan, China, in December 2019 and became a serious public health problem worldwide [1, 2]. Until now, no specific drug or vaccine has been found against COVID-19 [2]. The virus that causes COVID-19 epidemic disease is called severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) [3]. Coronaviruses (CoV) are a large family of viruses that cause diseases such as Middle East respiratory syndrome (MERS-CoV) and severe acute respiratory syndrome (SARS-CoV). COVID-19 is a new species discovered in 2019 and has not been previously identified in humans [4]. COVID-19 causes lighter symptoms in about 99% of cases, according to early data, while the rest is severe or critical [5]. As of January 31, 2021, the total number of worldwide cases of coronavirus is 103,286,991 including 2,232,776 deaths. Of these, the number of active patients is 26,127,156 [6]. Nowadays the world is struggling with the COVID-19 epidemic. Deaths from pneumonia developing due to the SARS-CoV-2 virus are increasing day by day.

Chest radiography (X-ray) is one of the most important methods used for the diagnosis of pneumonia worldwide [7]. Chest X-ray is a fast, cheap [8] and common clinical method [9,10,11]. The chest X-ray gives the patient a lower radiation dose compared to computed tomography (CT) and magnetic resonance imaging (MRI) [11]. However, making the correct diagnosis from X-ray images requires expert knowledge and experience [7]. It is much more difficult to diagnose using a chest X-ray than other imaging modalities such as CT or MRI [8].

By looking at the chest X-ray, COVID-19 can only be diagnosed by specialist physicians. The number of specialists who can make this diagnosis is less than the number of normal doctors. Even in normal times, the number of doctors per person is insufficient in countries around the world. According to data from 2017, Greece ranks first with 607 doctors per 100,000 people. In other countries, this number is much lower [12].

In case of disasters such as COVID-19 pandemic, demanding health services at the same time, collapse of the health system is inevitable due to the insufficient number of hospital beds and health personnel. Also, COVID-19 is a highly contagious disease, and doctors, nurses and caregivers are most at risk. Early diagnosis of pneumonia has a vital importance both in terms of slowing the speed of the spread of the epidemic by quarantining the patient and in the recovery process of the patient.

Doctors can diagnose pneumonia from the chest X-ray more quickly and accurately thanks to computer-aided diagnosis (CAD) [8]. Use of artificial intelligence methods is increasing due to its ability to cope with enormous datasets exceeding human potential in the field of medical services [13]. Integrating CAD methods into radiologist diagnostic systems greatly reduces the workload of doctors and increases reliability and quantitative analysis [11]. CAD systems based on deep learning and medical imaging are becoming more and more research fields [13, 14].

In this study, we have proposed an automatic CAD prediction of COVID-19 using a deep convolutional neural network-based pre-trained transfer models and chest X-ray images. For this purpose, we have used ResNet50, ResNet101, ResNet152, InceptionV3 and Inception-ResNetV2 pre-trained models to obtain higher prediction accuracies for three different binary datasets including X-ray images of normal (healthy), COVID-19, bacterial and viral pneumonia patients.

The novelty and originality of proposed study are summarized as follows: (1) The proposed models have end-to-end structure without manual feature extraction, selection and classification. (2) The performances of the COVID-19 data across normal, viral pneumonia and bacterial pneumonia classes were significantly higher. (3) It has been studied with more data than many studies in the literature. (4) It has been studied and compared with 5 different CNN models. (5) A high-accuracy decision support system has been proposed to radiologists for the automatic diagnosis and detection of patients with suspected COVID-19 and follow-up.

The flow of the manuscript is organized as follows: The work done in the field of deep learning techniques on chest X-ray and CT images for COVID-19 disease is presented in Sect. 2. Dataset is expressed in detail in Sect. 3.1. Deep transfer learning architecture, pre-trained models and experimental setup parameters are described in Sects. 3.2 and 3.3, respectively. Performance metrics are given in detail in Sect. 3.4. Obtained experimental results from proposed models and discussion are presented in Sects. 4 and 5, respectively. Finally, in Sect. 6, the conclusion and future works are summarized.

2 Related works

Studies diagnosed with COVID-19 using chest X-rays have binary or multiple classifications. Some studies use raw data, while others have feature extraction process. The number of data used in studies also varies. Among the studies, the most preferred method is convolutional neural network (CNN).

Apostolopoulos and Bessiana used a common pneumonia, COVID-19-induced pneumonia, and an evolutionary neural network for healthy differentiation on automatic detection of COVID-19. In particular, the procedure called transfer learning has been adopted. With transfer learning, the detection of various abnormalities in small medical image datasets is an achievable goal, often with remarkable results [15]. Based on chest X-ray images, Zhang et al. aimed to develop a deep learning-based model that can detect COVID-19 with high sensitivity, providing fast and reliable scanning [16]. Singh et al. classified the chest computed tomography (CT) images from infected people with and without COVID-19 using multi-objective differential evolution (MODE)-based CNN [17]. Jaiswal et al. proposed DenseNet201-based deep transfer learning model on chest CT images to classify the patients as COVID-19 infected or not [14]. In the study of Chen et al, they proposed Residual Attention U-Net for automated multi-class segmentation technique to prepare the ground for the quantitative diagnosis of lung infection on COVID-19-related pneumonia using CT images [18]. Adhikari’s study suggested a network called “Auto Diagnostic Medical Analysis” trying to find infectious areas to help the doctor better identify the diseased part, if any. Both X-ray and CT images were used in the study. It has been recommended DenseNet network to remove and mark infected areas of the lung [19]. In the study by Alqudah et al., two different methods were used to diagnose COVID-19 using chest X-ray images. The first one used AOCTNet, MobileNet and ShuffleNet CNNs. Secondly, the features of their images have been removed and they have been classified using softmax classifier, K nearest neighbor (kNN), support vector machine (SVM) and random forest (RF) algorithms [20]. Khan et al. classified the chest X-ray images from normal, bacterial and viral pneumonia cases using the Xception architecture to detect COVID-19 infection [21]. Ghoshal and Tucker used the dropweights-based Bayesian CNN model using chest X-ray images for the diagnosis of COVID-19 [22]. Hemdan et al. used VGG19 and DenseNet models to diagnose COVID-19 from X-ray images [23]. Ucar and Korkmaz worked on X-ray images for COVID-19 diagnosis and supported the SqueezeNet model with Bayesian optimization [24]. In the study conducted by Apostopolus et al., they performed automatic detection from X-ray images using CNNs with transfer learning [25]. Sahinbas and Catak used X-ray images for the diagnosis of COVID-19 and worked on VGG16, VGG19, ResNet, DenseNet and InceptionV3 models [26]. Medhi et al. used X-ray images as feature extraction and segmentation in their study, and then, COVID-19 was positively and normally classified using CNN [27]. Barstugan et al. classified X-ray images for the diagnosis of COVID-19 using five different feature extraction methods that are Grey-Level Cooccurrence Matrix (GLCM), Local Directional Patterns (LDP), Grey-Level Run Length Matrix (GLRLM), Grey-Level Size Zone Matrix (GLSZM) and Discrete Wavelet Transform (DWT). The obtained features were classified by SVM. During the classification process, two-fold, five-fold and ten-fold cross-validation methods were used [28]. Punn and Agarwal worked on X-ray images and used ResNet, InceptionV3, Inception-ResNet models to diagnose COVID-19 [29]. Afshar et al. developed deep neural network (DNN)-based diagnostic solutions and offered an alternative modeling framework based on Capsule Networks that can process on small datasets [30].

In our previous study in March 2020, we used ResNet50, InceptionV3 and Inception-ResNetV2 models for the diagnosis of COVID-19 using chest X-ray images. However, since there were not enough data on COVID-19, we were only able to train through 50 normal and 50 COVID-19 positive cases [31]. Therefore, to overcome the issues associated with our previous study [31], proposed study was reconducted by increasing the number of data and deep transfer learning models to classify COVID-19-infected patients.

3 Materials and methods

3.1 Dataset

In this study, chest X-ray images of 341 COVID-19 patients have been obtained from the open source GitHub repository shared by Dr. Joseph Cohen et al. [32]. This repository is consisting chest X-ray/computed tomography (CT) images of mainly patients with acute respiratory distress syndrome (ARDS), COVID-19, Middle East respiratory syndrome (MERS), pneumonia, severe acute respiratory syndrome (SARS). 2800 normal (healthy) chest X-ray images were selected from “ChestX-ray8” database [33]. In addition, 2772 bacterial and 1493 viral pneumonia chest X-ray images were used from Kaggle repository called “Chest X-Ray Images (Pneumonia)” [34].

Our experiments have been based on three binary created datasets (Dataset-1, Dataset-2 and Dataset-3) with chest X-ray images. Distribution of images per class in created datasets is given in Table 1.

Table 1 Number of images per class for each dataset

The data augmentation method was used with scaling factor = 1./255, shear range = 0.1, zoom range = 0.1 and horizontal flipping enabled in training dataset. All images were resized to 224 \(\times\) 224 pixel size in the datasets. In Fig. 1, representative chest X-ray images of normal (healthy), COVID-19, bacterial and viral pneumonia patients are given, respectively.

Fig. 1
figure 1

Representative chest X-ray images of normal (healthy) (first row), COVID-19 (second row), bacterial (third row) and viral pneumonia (fourth row) patients

3.2 Architecture of deep transfer learning

Deep learning is a sub-branch of the machine learning field, inspired by the structure of the brain. Deep learning techniques used in recent years continue to show an impressive performance in the field of medical image processing, as in many fields. By applying deep learning techniques to medical data, it is tried to draw meaningful results from medical data.

Deep learning models have been used successfully in many areas such as classification, segmentation and lesion detection of medical data. Analysis of image and signal data was obtained with medical imaging techniques such as magnetic resonance imaging (MRI), computed tomography (CT) and X-ray with the help of deep learning models. As a result of these analyzes, detection and diagnosis of diseases such as diabetes mellitus, brain tumor, skin cancer and breast cancer are provided in studies with convenience [35,36,37,38,39,40,41].

A convolutional neural network (CNN) is a class of deep neural networks used in image recognition problems [42]. Coming to how CNN works, the images given as input must be recognized by computers and converted into a format that can be processed. For this reason, images are first converted to matrix format. The system determines which image belongs to which label based on the differences in images and therefore in matrices. It learns the effects of these differences on the label during the training phase and then makes predictions for new images using them. CNN consists of three different layers that are a convolutional layer, pooling layer and fully connected layer to perform these operations effectively. The feature extraction process takes place in both convolutional and pooling layers. On the other hand, the classification process occurs in fully connected layer. These layers are examined sequentially in the following.

3.2.1 Convolutional layer

Convolutional layer is the base layer of CNN. It is responsible for determining the features of the pattern. In this layer, the input image is passed through a filter. The values resulting from filtering consist of the feature map. This layer applies some kernels that slide through the pattern to extract low- and high-level features in the pattern [43]. The kernel is a 3 \(\times\) 3- or 5 \(\times\) 5-shaped matrix to be transformed with the input pattern matrix. Stride parameter is the number of steps tuned for shifting over input matrix. The output of convolutional layer can be given as:

$$\begin{aligned} x_j^l=f\left( \sum \limits _{a=1}^N{w_j^{l-1}*y_a^{l-1}+b_j^l}\right) \end{aligned}$$
(1)

where \(x_j^l\) is the jth feature map in layer l, \(w_j^{l-1}\) indicates jth kernels in layer \(l-1\), \(y_a^{l-1}\) represents the ath feature map in layer \(l-1\), \(b_j^l\) indicates the bias of the jth feature map in layer l, N is number of total features in layer \(l-1\), and \((*)\) represents vector convolution process.

3.2.2 Pooling layer

The second layer after the convolutional layer is the pooling layer. Pooling layer is usually applied to the created feature maps for reducing the number of feature maps and network parameters by applying corresponding mathematical computation. In this study, we used max-pooling and global average pooling. The max-pooling process selects only the maximum value by using the matrix size specified in each feature map, resulting in reduced output neurons. There is also a global average pooling layer that is only used before the fully connected layer, reducing data to a single dimension. It is connected to the fully connected layer after global average pooling layer. The other intermediate layer used is the dropout layer. The main purpose of this layer is to prevent network overfitting and divergence [44].

3.2.3 Fully connected layer

Fully connected layer is the last and most important layer of CNN. This layer functions like a multilayer perceptron. Rectified linear unit (ReLU) activation function is commonly used on fully connected layer, while softmax activation function is used to predict output images in the last layer of fully connected layer. Mathematical computation of these two activation functions is as follows:

$$\begin{aligned}&{\text {ReLU}}(x)={\left\{ \begin{array}{ll} 0, &{} {\text {if}} \ x<0 \\ x, &{} {\text {if}} \ x\ge 0 \\ \end{array}\right. } \end{aligned}$$
(2)
$$\begin{aligned}&{\text {Softmax}}(x_i)=\frac{e^{x_i}}{\sum \nolimits _{y=1}^m{e^{x_y}}} \end{aligned}$$
(3)

where \(x_i\) and m represent input data and the number of classes, respectively. Neurons in a fully connected layer have full connections to all activation functions in previous layer.

3.2.4 Pre-trained models

Training convolutional neural network (CNN) models with millions of parameters from scratch is not only very time-consuming, but also requires equipment with high performance. To overcome these problems, parameters and weights of models trained on different datasets are transferred to the new model [45, 46]. Apart from the transferred parts, the learning process is also carried out through the newly added layers. It is stated that it is particularly successful even in few datasets [47]. In addition, this method used allows to obtain results faster with lower calculation cost.

In the analysis of medical data, one of the biggest difficulties faced by researchers is the limited number of available datasets. Deep learning models often need a lot of data. Labeling this data by experts is both costly and time-consuming. The biggest advantage of using transfer learning method is that it allows the training of data with fewer datasets and requires less calculation costs. With the transfer learning method, which is widely used in the field of deep learning, the information gained by the pre-trained model on a large dataset is transferred to the model to be trained.

In this study, we built deep CNN-based ResNet50, ResNet101, ResNet152, InceptionV3 and Inception-ResNetV2 models for the classification of COVID-19 chest X-ray images to three different binary classes (Binary Class-1 = COVID-19 and normal (healthy), Binary Class-2 = COVID-19 and viral pneumonia and Binary Class-3 = COVID-19 and bacterial pneumonia). In addition, we applied transfer learning technique that was realized by using ImageNet data to overcome the insufficient data and training time. The schematic representation of conventional CNN including pre-trained ResNet50, ResNet101, ResNet152, InceptionV3 and Inception ResNetV2 models for the prediction of normal (healthy), COVID-19, bacterial and viral pneumonia patients is depicted in Fig. 2. It is also available publicly for open access at https://github.com/drcerenkaya/COVID-19-DetectionV2.

  • ResNet50

    Residual neural network (ResNet) model is an improved version of CNN. ResNet adds shortcuts between layers to solve a problem. Thanks to this, it prevents the distortion that occurs as the network gets deeper and more complex. In addition, bottleneck blocks are used to make training faster in the ResNet model [48]. ResNet50 is a 50-layer network trained on the ImageNet dataset. ImageNet is an image database with more than 14 million images belonging to more than 20,000 categories created for image recognition competitions [49].

  • InceptionV3

    InceptionV3 is a kind of convolutional neural network model. It consists of numerous convolution and maximum pooling steps. In the last stage, it contains a fully connected neural network [50]. As with the ResNet50 model, the network is trained with ImageNet dataset.

  • Inception-ResNetV2

    The model consists of a deep convolutional network using the Inception-ResNetV2 architecture that was trained on the ImageNet-2012 dataset. The input to the model is a 299 \(\times\) 299 image, and the output is a list of estimated class probabilities [51].

  • ResNet101 and ResNet152

    ResNet101 and ResNet152 consist of 101 and 152 layers, respectively, due to stacked ResNet building blocks. You can load a pre-trained version of the network trained on more than a million images from the ImageNet database [49]. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224 \(\times\) 224.

Fig. 2
figure 2

Schematic representation of pre-trained models for the prediction of normal (healthy), COVID-19, bacterial and viral pneumonia patients

3.3 Experimental setup

Python programming language was used to train the proposed deep transfer learning models. All experiments were performed on Google Colaboratory (Colab) Linux server with the Ubuntu 16.04 operating system using the online cloud service with Central Processing Unit (CPU), Tesla K80 Graphics Processing Unit (GPU) or Tensor Processing Unit (TPU) hardware for free. CNN models (ResNet50, ResNet101, ResNet152, InceptionV3 and Inception-ResNetV2) were pre-trained with random initialization weights by optimizing the cross-entropy function with adaptive moment estimation (ADAM) optimizer (\(\beta 1 = 0.9\) and \(\beta 2 = 0.999\)). The batch size, learning rate and number of epochs were experimentally set to 3, 1e\(-\)5 and 30, respectively, for all experiments. All datasets used were randomly split into two independent datasets with 80% and 20% for training and testing, respectively. As cross-validation method, k-fold was chosen and results were obtained according to 5 different k values (k = 1–5) as shown in Fig. 3.

Fig. 3
figure 3

Visual display of testing and training datasets for five-fold cross-validation

3.4 Performance metrics

Five criteria were used for the performances of deep transfer learning models. These are:

$$\begin{aligned} {\text {Accuracy}}= & {} \frac{\mathrm{{TP\,+\,TN}}}{{\text {TP\,+\,TN\,+\,FP\,+\,FN}}} \end{aligned}$$
(4)
$$\begin{aligned} {\text {Recall}}= & {} \frac{\mathrm{{TP}}}{{\text {TP\,+\,FN}}}\end{aligned}$$
(5)
$$\begin{aligned} {\text {Specificity}}= & {} \frac{\mathrm{{TN}}}{{\text {TN\,+\,FP}}}\end{aligned}$$
(6)
$$\begin{aligned} {\text {Precision}}= & {} \frac{\mathrm{{TP}}}{{\text {TP\,+\,FP}}}\end{aligned}$$
(7)
$$\begin{aligned} {\text {F1-score}}= & {} \frac{2*{\text {Precision\,*\,Recall}}}{\mathrm{{Precision\,+\,Recall}}} \end{aligned}$$
(8)

TP, FP, TN and FN given in Eqs. (4)–(8) represent the number of true positive, false positive, true negative and false negative, respectively. For Dataset-1, given a test dataset and model, TP is the proportion of positive (COVID-19) that is correctly labeled as COVID-19 by the model; FP is the proportion of negative (normal) that is mislabeled as positive (COVID-19); TN is the proportion of negative (normal) that is correctly labeled as normal; and FN is the proportion of positive (COVID-19) that is mislabeled as negative (normal) by the model.

4 Experimental results

In this paper, we performed 3 different binary classifications with 4 different classes (COVID-19, normal, viral pneumonia and bacterial pneumonia). Five-fold cross-validation method has been used in order to get a robust result in this study performed with 5 different pre-trained models that are InceptionV3, ResNet50, ResNet101, ResNet152 and Inception-ResNetV2. While 80% of the data is reserved for training, the remaining 20% is reserved for testing. All this process continued until each 20% part was tested. Considering the training times of all models, for Dataset-1 (Binary Class-1), the total training times of InceptionV3, ResNet50, ResNet101, ResNet152 and Inception-ResNetV2 pre-trained models were 16027 s, 14638 s, 17841 s, 18802 s and 23078 s, respectively. For Dataset-2 (Binary Class-2), the total training times of InceptionV3, ResNet50, ResNet101, ResNet152 and Inception-ResNetV2 pre-trained models were 12241 s, 9948 s, 13089 s, 14923 s and 19336 s, respectively. Finally, for Dataset-3 (Binary Class-3), the total training times of InceptionV3, ResNet50, ResNet101, ResNet152 and Inception-ResNetV2 pre-trained models were 15801 s, 14386 s, 17658 s, 18581 s and 22865 s, respectively.

Firstly, the accuracy and loss values in the training process obtained for the models applied to Dataset-1 that includes Binary Class-1 (COVID-19/normal classes) are given in Figs. 4 and 5. It is clear that the performance of the ResNet50 model is better than the other models. It can be said that the ResNet50 model reaches lower values among the loss values of other models. Detection performance on test data is shown in Fig. 6. While a lot of oscillation is observed in some models, some models are more stable. The ResNet50 model appears to have less oscillation after the 15th epoch. Comprehensive performance values for each fold value of each model are given in Table 2. As seen from Table 2, the detection of the ResNet50 model in the COVID-19 class is significantly higher than the other models. ResNet50 and ResNet101 have the highest overall performance with 96.1%. It is obvious that the excess of normal data results in higher performance in all models.

Fig. 4
figure 4

Binary Class-1: comparison of training accuracy of 5 different models for fold-4

Fig. 5
figure 5

Binary Class-1: comparison of training loss values of 5 different models for fold-4

Fig. 6
figure 6

Binary Class-1: comparison of testing accuracy of 5 different models for fold-4

Table 2 All performances of 5 different models on each fold for COVID-19/normal binary classification

Secondly, when the results obtained for the data in Binary Class-2 (COVID-19/viral pneumonia classes) are evaluated, the training performances of the models given in Figs. 7 and 8 are quite high. It can be said that the accuracy values and loss values of the ResNet50 and ResNet101 models perform better than the other models. Performance values obtained through test data are shown in Fig. 9. Here, the models’ results on the test data are generally more stable. There is no oscillation except when there is excessive oscillation only in the first 3 epochs of the ResNet50 model. Detailed performances of the models are given in Table 3. It is clearly seen that quite high values are reached for each fold value. While 99.4% was reached in the detection of COVID-19, it is seen that 99.5% was reached in the detection of viral pneumonia.

Fig. 7
figure 7

Binary Class-2: comparison of training accuracy of 5 different models for fold-4

Fig. 8
figure 8

Binary Class-2: comparison of training loss values of 5 different models for fold-4

Fig. 9
figure 9

Binary Class-2: comparison of testing accuracy of 5 different models for fold-4

Table 3 All performances of 5 different models on each fold for COVID-19/viral pneumonia binary classification

In the last study, the detection success of Binary Class-3 (COVID-19/bacterial pneumonia classes) was investigated. The performances of 5 different models on both training and test data are given in Figs. 10, 11 and 12. As in other studies, it is clearly seen that the ResNet50 model exhibits higher training performance. InceptionV3 model is seen to exhibit increasing performance toward the end of the epoch number. When the detailed results given in Table 4 are evaluated, it can be said that the InceptionV3 model has a performance of 100% in the detection of COVID-19, while the overall performance is also said to be the ResNet50 model which has a high success.

Fig. 10
figure 10

Binary Class-3: comparison of training accuracy of 5 different models for fold-4

Fig. 11
figure 11

Binary Class-3: comparison of training loss values of 5 different models for fold-4

Fig. 12
figure 12

Binary Class-3: comparison of testing accuracy of 5 different models for fold-4

Table 4 All performances of 5 different models on each fold for COVID-19/bacterial pneumonia binary classification

5 Discussion

The use of artificial intelligence-based systems is very common in detecting those caught in the COVID-19 epidemic. As given in Table 5, there are many studies on this subject in the literature. In binary classification, it is common to distinguish COVID-19 positive from COVID-19 negative. In addition, it is very important to distinguish viral and bacterial pneumonia patients, which are other types of diseases affecting the lung, from COVID-19 positive patients. There are a limited number of studies in the literature that work with multiple classes. Narayan Das et al. conducted studies for 3 different classes (COVID-19 positive, pneumonia and other infection). The researchers used 70% of the data for the training, the remaining 10% for validation and 20% for the test. As a result, they obtained 97.40% accuracy over test data with extreme version of Inception (Xception) CNN model [9]. Singh et al. proposed a two-class study using limited data. They reported their performances by dividing the dataset at different training and testing rates. They achieved the highest accuracy of \(94.65\mp 2.1\) at 70% training–30% testing rates. In their study, they set the CNN hyper-parameters using multi-objective adaptive differential evolution (MADE) [52]. Afshar et al. conducted their studies using a method called COVID-CAPS with multi-class (normal, bacterial pneumonia, non-COVID viral pneumonia and COVID-19) studies. They achieved 95.7% accuracy with the approach without pre-training and 98.3% accuracy with pre-trained COVID-CAPS. However, although their sensitivity values are not as high as general accuracy, they detected the sensitivity without using pre-training and with using pre-trained COVID-CAPS as 90% and 80%, respectively [30].

Ucar and Korkmaz carried out multi-class (normal, pneumonia and COVID-19 cases) work with deep Bayes-SqueezeNet. They obtained the average accuracy value of 98.26%. They worked with 76 COVID-19 data [24]. Sahinbas and Catak worked with 5 different pre-trained models (VGG16, VGG19, ResNet, DenseNet and InceptionV3). They achieved 80% accuracy with VGG16 as their binary classifier performances. They worked with 70 COVID positives and 70 COVID negatives in total [26]. Khan et al. worked with normal, pneumonia-bacterial, pneumonia-viral and COVID-19 chest X-ray images. As a result, they achieved 89.6% overall performance with the model they named CoroNet. They used 290 COVID-19 data. They worked with more COVID-19 data than many studies [21]. Medhi et al. achieved 93% overall performance value in their study using deep CNN. They worked with 150 pieces of COVID-19 data [27].

In another study, Zhang and his colleagues performed binary and multi-class classifications containing 106 COVID-19 data. They found the detection accuracy of 95.18% with the confidence-aware anomaly detection (CAAD) model [16]. Apostopolus et al. obtained an accuracy of 93.48% using a total of 224 COVID-19 data with the VGG-19 CNN model for their 3 classes (COVID-19–bacterial–normal) study [25]. Narin et al. used 50 COVID-19/50 normal data in their study, where they achieved 98% accuracy with ResNet50 [31]. In many studies in the literature, researchers have studied a limited number of COVID-19 data. In this study, the differentiation performance of 341 COVID-19 data from each other was investigated with 3 different studies. In the study, 5 different CNN models were compared. The most important points in the study can be expressed as follows:

\(\bullet\) There are no manual feature extraction, feature selection and classification in this method. It was realized end to end directly with raw data. \(\bullet\) The performances of the COVID-19 data across normal, viral pneumonia and bacterial pneumonia classes were significantly higher. \(\bullet\) It has been studied with more data than many studies in the literature. \(\bullet\) It has been studied and compared with 5 different CNN models. \(\bullet\) A high-accuracy decision support system has been proposed to radiologists for the automatic diagnosis and detection of patients with suspected COVID-19 and follow-up.

From another point of view, considering that this pandemic period affects the whole world, there is a serious increase in the work density of radiologists. In these manual diagnoses and determinations, the expert’s tiredness may increase the error rate. It is clear that decision support systems will be needed in order to eliminate this problem. Thus, a more effective diagnosis can be made. The most important issue that restricts this study is to work with limited data. Increasing the data, testing it with the data in many different centers will enable the creation of more stable systems. In future studies, the features will be extracted using image processing methods on X-ray and CT images. From these extracted features, the features that provide the best separation between classes will be determined and performance values will be measured with different classification algorithms. In addition, the results will be compared with deep learning models. Apart from this, the results of the study will be tested with data from many different centers. In a future study, studies will be conducted to determine the demographic characteristics of patients and COVID-19 possibilities with artificial intelligence-based systems.

Table 5 The performance comparison literature about COVID-19 diagnostic methods using chest X-ray images

6 Conclusion

Early prediction of COVID-19 patients is vital to prevent the spread of the disease to other people. In this study, we proposed a deep transfer learning-based approach using chest X-ray images obtained from normal, COVID-19, bacterial and viral pneumonia patients to predict COVID-19 patients automatically. Performance results show that ResNet50 pre-trained model yielded the highest accuracy among five models for used three different datasets (Dataset-1: 96.1%, Dataset-2: 99.5% and Dataset-3: 99.7%). In the light of our findings, it is believed that it will help radiologists to make decisions in clinical practice due to the higher performance. In order to detect COVID-19 at an early stage, this study gives insight on how deep transfer learning methods can be used. In subsequent studies, the classification performance of different CNN models can be tested by increasing the number of COVID-19 chest X-ray images in the dataset.