1 Introduction

Childhood respiratory diseases are a severe public health problem in many countries, especially in Southeast Asia and sub-Saharan Africa. Data from the World Health Organization (WHO) reveals that pneumonia is responsible for high morbidity and mortality in children under 5 years old. About 15% of all deaths in this age group are due to this disease.

Magnetic resonance imaging (MRI), chest X-ray (CXR), and computed tomography (CT) are imaging tests used to diagnose pneumonia [21]. CXR is the most common method to detect pneumonia worldwide due to its low cost, speed, and availability. In addition to pneumonia, radiography can be used to diagnose other pathologies, such as the detection of fractures [23, 31, 33], cavities [16], and tumors [32].

Pneumonia usually manifests itself as an area or areas of greater opacification in chest radiographs. Figure 1a shows a CXR of a patient with pneumonia. The opaque regions in this figure confirm the diagnosis of pneumonia. Figure 1 also shows other anatomical structures, such as the heart, bones, and blood vessels, as shown in Fig. 1b.

Fig. 1
figure 1

a Example of pulmonary opacities; b Normal chest radiography showing the main identifiable anatomical structures (LA left atrium, LV left ventricle, AD right atrium)

Detecting pneumonia on CXR is a challenge even for experienced professionals as other lung abnormalities, such as lung cancer and excess fluid, can show similar opacities. In addition to such lung abnormalities, the expert’s interpretation can also be influenced by the position of the patient and level of inspiration at the moment of the examination.

In addition to the factors inherent to the examination, other factors at the moment of analysis and interpretation can contribute to an erroneous diagnosis: (1) subjectivity and experience of the professional, (2) work fatigue due to repetitive actions, and (3) lighting levels in the consultation room.

Given this scenario, the detection of pneumonia in children from the conventional analysis of CXR images is time-consuming and subjective. These factors cause a delay in both the diagnosis and treatment. Therefore, there is a need for a reliable system capable of overcoming such difficulties so that the professional can diagnose and refer the patient for treatment in real-time and with greater assurance.

Artificial intelligence techniques are multi-purpose procedures that are mainly dedicated to classification, prediction, and grouping [26]. Due to the significant impact on morbidity in children, pneumonia is a disease that requires a rapid diagnosis and appropriate treatment. Computerized diagnostics (CAD) using the Internet of Things technology have gained space in the medical community. These systems are justified due to their fast response and increased accuracy in medical diagnoses. These factors are important, especially in regions where clinical conditions are precarious. Real-time IoT systems to assist doctors have been proven to be effective. They have been successful in detecting and in analyzing cranial effusions in computed tomography images [5, 28], and to classify electrocardiograms (ECG) [10], among others [24].

According to the United Nations International Children’s Emergency Fund (UNICEF), one way to obtain better results in the clinical treatment of pneumonia in children is to carry out real-time diagnoses. Thus, the use of a real-time IoT system, which doctors can use anywhere, would guarantee that more people would be diagnosed rapidly and without prejudice to subjective and structural factors.

The emergence of Convolutional Neural Networks (CNN) and the recent advances in computational power have boosted progress in using computational methods to assist specialists in analyzing clinical diseases. One of the main factors for the high performance of CNNs is the ability of neural networks to learn high-level abstractions from raw data through transfer learning techniques.

Advances in the use of CNN’s have improved specialists’ performance in the modalities of segmenting anatomical structures [6, 9, 11, 22], and of classifying and detecting diseases [5]. Several important approaches to detect pneumonia have been proposed by Ayan and Ünver [1], Chouhan et al. [4], and Kermany et al. [18].

The approach proposed in this work has potential because it integrates proven methods, which have given excellent results with low computational costs, and allows their implementation in an IoT system with free access. The proposed system has accuracy superior to or equal to the methods already proposed and has stability, computational efficiency, accessibility, and overcomes the socio-economic issues.

Considering the context of lung problems, the difficulty in diagnosing pneumonia from CXR images, and the motivation for the concept of the Internet of Things (IoT) and Transfer Learning, this article proposes an automatic, fast, and accessible system that assists doctors in the diagnosis of pneumonia. This system uses and complements techniques already consolidated in the literature from new combinations of CNNs with machine learning classifiers, all in a real-time IoT system.

The main contributions of this work are listed below:

  1. 1.

    The analysis of twelve CNNs as feature extractors combined with seven classic classifiers to maximize the performance of a system aimed at medical assistance in diagnosing pneumonia.

  2. 2.

    A method consolidated in a real-time IoT system for clinical predictions in the medical field.

  3. 3.

    An interactive and easy-to-use system to assist specialists in the diagnosis of pneumonia in children.

The remainder of this article is organized as follows: Sect. 2 presents a brief literature review of studies for the diagnosis of pneumonia; Sect. 3 presents the methodology of the proposed approach; Sect. 4 presents the results and their respective discussions, while Sect. 5 presents the main conclusions and possible future work.

2 Related works

In recent years, significant progress has been made in developing antibiotics, vaccines, and pneumonia treatments. However, this disease remains a public health problem. Consequently, research using computational methods capable of segmenting, detecting, and classifying pneumonia to support the medical community is recurrent in the literature [8].

Motivated by the satisfactory results with CNN in the diagnosis of other diseases [2, 19], the primary methods for the determination of pneumonia have also explored the use of CNNs.

An extensive literature search showed that the most satisfactory results in the diagnosis of pneumonia were developed in the works of Ayan and Ünver [1], Chouhan et al. [4], and Kermany et al. [18]. All authors used CNN in their research.

Ayan and Ünver [1] used the Xception and VGG16 architectures with transfer learning and fine-tuning. They changed the Xception model freezing the last 10 layers of the network, adding two fully connected layers and a two-way output layer with a SoftMax activation function. The motivation presented is that the greatest generalization capacity is in the first layers of the network. In the VGG16 architecture, the final eight layers were frozen, and there were changes to the fully connected layers. Thus, the work obtained a test time per image equivalent to 16 and 20 ms for the VGG16 and Xception networks, respectively.

Chouhan et al. [4] used the AlexNet, DenseNet121, InceptionV3, ResNet18 and GoogLeNet architectures. The diagnosis is based on a classifying committee composed of CNN models. Each model was part of the hypothesis of inducing the diagnosis through a vote. The majority vote, in their work, was used to combine the results of the classifiers. Therefore, the diagnosis corresponds to the class that achieved the highest number of votes. This approach obtained an average test time per image equivalent to 161 ms for the model. In addition, they achieved high percentages of classification for X-ray images. This shows that deep networks are an area of research that can help diagnose pneumonia. In our approach, eight architectures are evaluated more than in the work of Chouhan et al. [4]. Furthermore, we adopt traditional classifiers to reduce the classifying computational cost.

In both works, data augmentation was performed with variations of displacement, zoom, inversion, and rotation at angles of \(40^\circ\), Random Horizontal Flip, and Random Resized Crop, for example.

Kermany et al. [18] developed a method for diagnosing optical coherence tomography (OCT). The authors took advantage of the method and applied it to the diagnosis of pneumonia in children. They focused on proving the generalization of the proposed system. The work did not make comparisons with other similar methods or demonstrate the computational costs to reach their results.

Despite having obtained satisfactory results, all the approaches above generated their results with only 624 images. This corresponds to about half the number of images tested in our proposed work. The results above would be more reliable if they were obtained with a larger number of images and standard deviations.

Another prominent approach was that of Rahimzadeh et al. [27], who proposed to detect COVID-19, pneumonia, and normal from X-ray examinations. Rahimzadeh et al. proposed a network based on the concatenation of the Xception and ResNet50V2 networks. Due to the imbalance in the number of samples, the network’s training was divided into eight stages with equivalent numbers of samples. The results obtained show a disparity between the classes and a limitation in the generalization of the system to new samples of COVID-19. The method uses the same COVID-19 images at each new training stage and only 31 images for validation. The reliability of the results would be greater with data augmentation, standard deviations, and a demonstration of the computational costs.

Table 1 shows a chronological summary of the works to be compared with the proposed approach. Table 1 is organized, showing the approach, the main highlights, the disadvantages, and the Integration, if any, into an IoT system.

Table 1 Chronological summary of the works to be compared with the proposed approach. The table shows the approach, the main highlights, and disadvantages. In addition, if any of these works related their application to an IoT system

Some companies have developed devices capable of assisting doctors in diagnosing pneumonia. However, to acquire these systems, it is necessary to have high purchasing power to pay for the license. Therefore, it is not accessible to professionals at the beginning of their careers or hospitals with low purchasing power.

The proposed work uses CNN and transfer learning combined with classic classifiers in an IoT system accessible to all. This system’s characteristics enable high performance, low computational costs, and overcomes the financial problems mentioned. An IoT system makes it possible to diagnose and identify pneumonia inside and outside hospital environments. The requirements to use the platform are only a device connected to the internet and the digital image of the exam. The rapid response of the system makes it possible to have real-time results during a consultation. In the proposed work, the effectiveness of twelve CNN architectures is investigated, unlike the work of Ayan and Ünver [1] and Chouhan et al. [4] who used only two and five models, respectively. Furthermore, the stability and the computational cost obtained are demonstrated by calculating the standard deviation and the training and test times, differently from the other above-mentioned approaches.

3 Proposed approach and methodology

In this section, the methodology of this approach will be contextualized and described. This section is divided into Dataset (Sect. 3.1), Preprocessing (Sect. 3.2), Convolution neural networks as feature extractors (Subsection 3.3), Classification (Sect. 3.4) and Evaluation metrics (Sect. 3.6).

3.1 Dataset

The chest-X-ray dataset used is a collection of CXR images of children. The challenge of this dataset is to detect the presence or absence of pneumonia in the radiological exams from the anteroposterior position. The CXR images were made available by the Guangzhou Women and Children’s Medical Center.

The selection of images is from routine examinations carried out in the pediatric section of the medical center. The patients, whose identities were withheld when the database was built, were aged between 1 and 5 years. The data set is composed of 6,000 images with sizes ranging from 384\(\times\)127 \(\sim\) 2916\(\times\)2583. All the CXR examinations were subjected to expert analysis. The experts made a selection of good quality images and their respective diagnoses. The characteristics of this database are suitable for the validation of automatic computational methods for the classification process (Fig. 2).

Fig. 2
figure 2

Samples of each class of the CXR dataset. From left to right, sets of five images for the classes a normal, b pneumonia

3.2 Preprocessing

The pre-processing used in this approach corresponds to an adaptive equalization of the histogram. The dataset had a wide variety of CXR exams due to size and age range as well as the equipment used and the technician responsible. These conditions determine the quality of contrast and resolution of the exam, and at times poor conditions hinder the diagnosis. To work around this problem, we applied an adaptive histogram equalization based on sub-regions of the images.

This type of equalization was used to improve the contrast throughout the exam, especially the pulmonary region. The data set sample images shown below in Fig. 3 contain areas external to the lung. If an equalization based on global imaging parameters were performed, it would be difficult to improve important lung regions for diagnosis. Figure 3 demonstrates that small regions within the lung are shown, thus highlighting a characteristic of the individual’s lung without any external influence.

Fig. 3
figure 3

Samples of each class, for classes a normal and b pneumonia, from the chest radiography data set. From left to right, we compare the original sample and the preprocessed one, with a highlight after preprocessing

The regions to be equalized individually are defined by dividing the input image into n regions of the same size. These regions are divided and equalized separately, thus providing full equalization of the image. The smaller the selected regions are, the greater the sensitization of the method is in the presence of noise. The susceptibility to unwanted information is circumvented with a contrast limit. However, if exceeded, the image pixels will be redistributed evenly to other regions before equalization. Finally, after equalizing all image regions, a bilinear interpolation is performed to remove any noise from the edges of each area.

Figure 3 shows the change due to the adaptive histogram equalization in two samples of the dataset. All the images were resized to the input size of each CNN configuration. This resizing admits low computational costs because it uses the interpolation of the nearest neighbor of the OpenCV library. The average pre-processing time for all the images of the dataset was 14.924 ± 8.710 ms.

3.3 Convolution neural networks as feature extractors

The extraction of characteristics covered in this subsection is based on the use of transfer learning from CNN. This process in a CNN can be divided into the following steps: input of the image to be processed, application of non-linear conversions that result in a set of matrices smaller than the input image [12], the formation of feature vectors [20], and implementation of this vector on a structure composed of one or more layers of multiple perceptrons, called fully connected layers [25]. One of the main consequences of using CNN, more precisely Deep Learning, consists of linking many parameters to a few samples, resulting in overfitting [34].

The main idea of transfer learning is to transfer learning from one challenge to another [17]. The use of transfer learning allows minimizing overfitting, thus maintaining the ability to generalize the approach with few samples. The pre-training carried out requires extensive and varied datasets, such as the ImageNet dataset [25]. Because of this, we selected 12 high-performance CNNs pre-trained with the ImageNet dataset. The CNNs used are in Table 2, where some important characteristics are shown (Fig. 4).

Fig. 4
figure 4

Method flow

In the pre-trained CNNs, we preserved the network parameters from the original papers and made changes in the network structures to allow transfer learning. As shown in 4, the difference in the structure consists of removing the fully connected layers, responsible for the characteristic high computational cost, and the training/classification of the network [7, 17]. In summary, the new structure’s output no longer corresponds to each class’s probabilities but rather to an extensive set of characteristic vectors. The number of features extracted from each configuration used is shown in Table 2. Thus, the set of features of each configuration consists of the generation of a new dataset.

Table 2 Configurations of convolutional neural networks used in this work

All the CNNs used in the transfer learning process of this approach were subjected to the extraction of the same number of samples. The creation of the new dataset using deep feature extraction comes from 80% of the chest X-ray dataset, which is 4,800 images. This number of original images and the number of features extracted by each configuration, as seen in Table 2, demonstrates the large number of features produced by this method, which means that the classifiers can be trained robustly and reliably.

3.4 Classification

The classification is performed after obtaining the attributes on CNN. In this section, the five machine learning approaches that were employed in this step are described: Naive Bayes, Multilayer Perceptron (MLP), k-Nearest Neighbor (kNN), Random Forest (RF), and Support Vector Machines (SVM).

The Naive Bayes classifier consists of a group of supervised classifiers. Its machine learning approach is based on Bayes’ Decision Theory. The algorithm uses each possible class prior probability and the later probability of each possible class sample attributes. The classifier equation is based on the following aspects: conditional density, a priori probability, and probability density. This algorithm stands out for its class autonomy among the resources extracted from the sample. That is, the attributes of each possible class have resources that are not correlated with other classes.

MLP is an unsupervised algorithm designed to solve linearly non-separable problems. The structure of this neural network is composed of multiple layers. In the input layers of this network, there are the resource vectors that represent the sample. Pulses are produced in the hidden layers for sensitizing and modeling the weights between the layers interconnected with the error backpropagation algorithm. In the output layers, the value of each perceptron represents the network output for each possible sample class.

kNN is based on determining the class of an unknown sample from the spatial distribution of resources. Therefore, it is possible to identify the nearest k samples. The determination of similarity between the chosen samples is given by calculating the distance between the samples. The equation for the commonly used distance is the Euclidean distance.

RF estimates a sequence of interrelated conditionals from the training resources presented to the classifier. Conditionals are adjusted using an ensemble approach called random bagging. The main advantage of this classification method is that there is no overfitting, even with an increase in the number of trees.

SVM are classification methods based on changing the data distribution space. SVM performs the classification of an unknown sample based on the statistical learning acquired with the resources presented during the training. Statistical learning is possible from the determination of the hyperplanes that separate the data. The plotted hyperplane is defined according to the kernel used in training. There are linear, polynomial, and Radial Basis Function (RBF) kernels.

As mentioned in Sect. 3.4 for training and testing, 80% and 20% of the dataset were used, respectively. The training and test models used in the classifiers described in this section will be described in the items below. The sequence of the process and dimensioning for all classifiers were carried out equally, resulting in Table 4.

The hyperparameters used in classifier training come from a random search of 20 interactions. Table 3 contains information and definitions of the classifier parameters in the random search. The random search configures the best configuration for each classifier, which is saved regarding the average obtained in cross-validation. The tenfold cross-validation for each classifier uses 80% of the dataset. For the Naive Bayes Classifier, we did not use tenfold cross-validation.

After acquiring the best configuration for the classifiers, they were used with the remaining 20% of the data set to determine if the CXR exam belonged to an individual with pneumonia or not.

Table 3 Setup to search for hyperparameters of the classifiers

3.5 IoT framework

The Lapisco Image Interface for Development of Applications (LINDA) system consists of an IoT framework characterized by accessibility and medical diagnostics assistance. These characteristics are defined by accessing the system from any portable with an internet connection. The proposal is to insert this real-time system to detect pneumonia as part of the preliminary diagnostic process. Figure 5 shows the LINDA system’s characteristic flowchart, with each step divided and described in the image caption.

This IoT framework is structured in the communication of a web service with a cloud processing service. The web service is developed in the Java language, which allows the easy manipulation of the data and settings necessary to generate the classification results. This aspect of the platform is responsible too for communicating mobile devices and computers with the computational cloud.

Standardizing the images inserted by the user enables the platform to be compatible with different types of exams, thus demonstrating its robustness. Standardization consists of adjusting the size, format, and color conversion of the inserted images. Furthermore, the user is able to control and have a free choice of combinations, which, once produced, only need to be refilled.

The hardware and software configuration responsible for the processing was an AMD Ryzen 7 2700X Eight-Core with 16 threads and 32 Gb of memory, running Linux Ubuntu 16.04 64-bit operating system with no Graphical Processing Unit (GPU), Java version 1.8.0, Python version 3, Keras v2.3.1, Scikit-Learn library v0.21.0, and OpenCV v4.1.0.

Fig. 5
figure 5

LINDA system structure

3.6 Evaluation metrics

This work evaluated 36 combinations of CNN extractors with classic classifiers. Each combination was evaluated concerning the test time and evaluation metrics, more precisely, the Accuracy (Acc), Precision (Prec), Sensitivity (Sen), and F1 score metric. The prediction time corresponds to the average prediction time of the test samples, which were 20% of the dataset used that had 1200 images.

The behavior of the classifier was evaluated using metrics obtained in the confusion matrix. Figure 6 shows the confusion matrix with the Normal and Pneumonia classes. The normal class represents a group of healthy patients. In this matrix, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are arranged in a sectioned figure in rows and columns.

Fig. 6
figure 6

The confusion matrix structure

TP is the number of correctly classified CXR. FP is the number of times that a patient without pneumonia was classified incorrectly. TN is the number of radiographs from healthy patients that were classified correctly. FN is the number of radiographs from healthy patients that were incorrectly classified.

Equations 14 correspond to the respective calculations for the Accuracy (Acc), Precision, Sensitivity, and F1 score metrics. Accuracy is the frequency at which the model is correctly classifying the patient as healthy or ill. Precision checks how many patients predicted with pneumonia were actually ill. The Sensitivity (Sen) shows how many radiographs were classified as an unwanted type. F1 score is the harmonious average between Precision and Sensitivity.

$$\begin{aligned}{\rm Acc}= &{\frac{\rm TP+TN}{\rm TP+TN+FP+FN}} \end{aligned}$$
(1)
$$\begin{aligned} {\rm Prec}= & {\frac{\rm TP}{\rm TP + FP}} \end{aligned}$$
(2)
$$\begin{aligned}{\rm Sen}= & {\frac{\rm TP}{\rm TP + FN}} \end{aligned}$$
(3)
$$\begin{aligned} {\rm F1{-}score}= & 2 \times {\frac{{\rm Sen} \times {\rm Prec}}{{\rm Sen + Prec}}} \end{aligned}$$
(4)

4 Results and discussion

This section presents the computational results for the classification of radiographic images for patients with pneumonia and without pneumonia. The results achieved by this work are compared against the results obtained by other works in the literature.

4.1 Results

Tables 4 shows the mean values and their respective standard deviations for the metrics Accuracy, F1 score, Sensitivity, and Precision, in addition to the times achieved in the method’s steps. The proposed approach used twelve CNNs as extractors of image characteristics and seven classic classifiers, making up 84 combinations of pairs. The pairs of the extractor–classifier combination that reached values higher than 96.00% for the evaluative metrics Accuracy, F1 score, Sensitivity, and Precision are in bold.

Table 4 Accuracy, F1 score, Sensitivity, Precision, Extraction time, Training time (TrT), and Test time (TsT) obtained by classifying features extracted by different combinations of CNN architectures and features classifiers

A total of 19 combinations achieved 95.00% for all metrics with the test samples. The DenseNet121, VGG16, and VGG19 extractors and the MLP and SVM classifiers with the RBF kernel stood out; the MobileNet extractor reached values above 95.00% with the MLP, nearest neighbors, and the SVM classifiers with the linear, polynomial, and RBF kernels; Xception had excellent results with MLP and SVM linear; and both DenseNet169 and DenseNet201 achieved these high values with MLP, SVM linear, and SVM RBF.

The VGG19 network with the SVM-RBF achieved the highest Accuracy. This configuration gave Accuracy, F1 score, Sensitivity, and Precision averages of 96.468%, 96.461%, 96.468%, and 96.463%, respectively. In addition to this combination, six other combinations reached accuracy values above 96.00%.

Table 4 shows the extraction, training, and test times for the combinations in addition to the metrics. Knowledge of these times is essential to be aware of the computational costs. Time is also one of the crucial parameters for embedded systems. Furthermore, the standard deviation of the results here proves the stability and reliability of the proposed method.

The results show that the MLP classifier achieved the fastest test times among the combinations that reached Accuracy greater than 95.00%. This time is due to the low number of neurons in the hidden layer. The nearest neighbors had the slowest test times as it compared all the attributes extracted by the CNN.

The best combination, VGG19 with SVM-RBF, obtained the fastest training time among the combinations with Accuracy greater than 95%. The training time obtained is 43.972 ± 14.761 s, which, added to the extraction time, is less than a minute.

Table 5 illustrates the classes under study and the classification results obtained for the best extractor–classifier combinations obtained. More precisely, Table 5 shows the ability to differentiate between classes with the number of samples correctly and incorrectly predicted. DenseNet201 with MLP and SVM RBF obtained the lowest classification error for the reviews of patients who have not have pneumonia. In addition, MobileNet, in conjunction with SVM Polynomial and DenseNet169 with SVM RBF, achieved the smallest classification error for patients who had pneumonia. Figure 7 highlights the results obtained from the best combinations shown in Table 5, demonstrating by combination the Accuracy and the Test time per image with their respective standard deviations.

Table 5 Confusion matrix of the extractor–classifier combinations that reached values above 96.00%
Fig. 7
figure 7

Accuracy, and testing time for the best combinations of feature extractor with classifier. (C-1: MobileNet + SVM(Polynomial), C-2: DenseNet169 + MLP, C-3: DenseNet169 + SVM(RBF), C-4: DenseNet201 + MLP, C-5: DenseNet201 + SVM(RBF), C-6: VGG16 + SVM(RBF), and C-7: VGG19 + SVM(RBF)

4.2 Comparison with literature works

In this subsection, we compared our results with other works to evaluate and validate the proposed approach. Table 1 shows the approach, the main highlights, and disadvantages of these works. Table 6 compares the results obtained in the proposed approach with the other studies.

Table 6 Comparison between the proposed approach with the other works in the literature

The analysis of 84 different combinations demonstrated that the VGG19 extractor combined with the SVM classifier with the RBF kernel was the best model for detecting pneumonia in CXR images. This approach achieved higher values in the metrics Accuracy, F1 score, and Precision than the other works in the literature evaluated here.

Another highlight of the proposed method is the low computational cost obtained with the extractor–classifier combinations. The proposed approach obtained lower average times per image than the other approaches.

The average value of the test time obtained in the VGG19 approach with SVM-RBF reached the time of 5.570 ± 2.032 ms. Table 6 shows that our approach is three and four times faster than the work by Ayan, and Ünver [1] work with the VGG16, and Xception approaches. Also, it is 28 times faster than the work of Chouhan et al. [4].

In addition to achieving the highest values of the metrics, Accuracy, F1 score, and Precision combined with the fastest time per image, our approach is the only one that linked the proposed model to a real-time IoT system. As our work also focuses on the availability of access by specialists using a consolidated IoT system, the user can insert an image of the radiographic examination into the system, and seconds later, receives the diagnosis based on the radiography. Thus, the specialist will have the support of the system and more confidence in diagnosing a patient. The idea is not to replace the professional but to speed up and confirm the diagnosis of traditional methods. Consequently, improving the quality of treatment.

Thus, in summary, the main results of this work are

  • The CNN VGG19 and the SVM classifier with the RBF kernel obtained the best model for detecting pneumonia in CXR images.

  • The approach achieved 96.47% accuracy, 96.46% F1 score, 96.47% sensitivity, and 96.46% accuracy.

  • The time per image is 5.570 microseconds.

  • Availability of the approach in an IoT system consolidated by the medical community.

5 Conclusion and future works

This work proposes a method using CNNs as extractors of image characteristics and classic classifiers in a real-time IoT system to aid in the diagnosis of pneumonia in children. An extensive comparison of the method was carried out with twelve CNNs combined with seven classifiers equally tested on 1,200 CXR images of children.

The results were compared with the principal works of the literature in this area. The results of the proposed method showed that the combination of the convolutional neural network VGG19 and the classic SVM classifier with the RBF kernel was the best model. The results obtained in the metrics Accuracy, F1 score and Precision were, respectively, 96.47%, 96.46%, and 96.46%. This combination also had the lowest times compared to the other works reported in the literature, that is, 43.972 ± 14.761 s and 5.570 ± 2.032 ms for the training and classification times, respectively.

Another major contribution of this work is its availability to anyone through an IoT system consolidated in the medical community. The system enables the insertion of an X-ray exam image and produces the result in real time. LINDA sorts the image and returns a result confirming the presence or absence of pneumonia. Consequently, information from this system will help make the diagnosis more accurate and consistent.

The perspective of future work is focused on the segmentation of the regions of the lungs when and where pneumonia is present. Another perspective for future work is a historical progressive monitoring of the patient and affected region, assisting the pulmonologist in a visual analysis of the patient’s on-going condition.