Abstract

The emergence of cognitive computing and big data analytics revolutionize the healthcare domain, more specifically in detecting cancer. Lung cancer is one of the major reasons for death worldwide. The pulmonary nodules in the lung can be cancerous after development. Early detection of the pulmonary nodules can lead to early treatment and a significant reduction of death. In this paper, we proposed an end-to-end convolutional neural network- (CNN-) based automatic pulmonary nodule detection and classification system. The proposed CNN architecture has only four convolutional layers and is, therefore, light in nature. Each convolutional layer consists of two consecutive convolutional blocks, a connector convolutional block, nonlinear activation functions after each block, and a pooling block. The experiments are carried out using the Lung Image Database Consortium (LIDC) database. From the LIDC database, 1279 sample images are selected of which 569 are noncancerous, 278 are benign, and the rest are malignant. The proposed system achieved 97.9% accuracy. Compared to other famous CNN architecture, the proposed architecture has much lesser flops and parameters and is thereby suitable for real-time medical image analysis.

1. Introduction

Due to the advancement of sophisticated machine learning algorithms, mobile computing, wireless communications [1, 2], and finally cognitive computing [3, 4], the healthcare industry is booming in recent years. The traditional healthcare industry is now gradually shifting towards the smart healthcare industry [5]. The smart healthcare enables patients to have their health problems diagnosed sitting at their homes, to get the prescription and advice online, and thereby to save time for communication and getting an appointment [6]. One of the major driving forces behind the rise of the smart healthcare industry is the invention of deep learning algorithms in machine learning domain [7]. Deep learning has brought about a paradigm shift to machine learning. For the last ten years, it was used in numerous applications for signal and image processing, including medical signals or images [810].

Lung cancer has become one of the leading causes of death worldwide. In a recent statistic of 2019 from the American Cancer Society, it was found that more than 142K people died of lung and bronchus cancer and more than 228K people were diagnosed for lung and bronchus cancer [11]. The number of fatal cancer deaths can be greatly reduced by early diagnosis.

The early detection of cancer can be detected in two ways: manually by radiologists or automatically by a computer-aided diagnosis (CAD) system. The CAD system is not a standalone system; it can only assist the radiologists or the doctors to take a correct decision. The final decision depends on the radiologists or the doctors [12]. A radiologist needs careful observation of the density of pulmonary nodules, because at an early stage, this density may resemble the densities of other lung parts [13]. A CAD system tries to make a boundary of a pulmonary nodule by detecting some distinguishing features in the nodule. These features are either manual features or deep-learned features. Manual features include information about texture, density, and morphology. The features are fed to a classifier for the detection or the classification of the nodule.

The CAD system helps the radiologists to improve the reading of the computed tomography (CT) scans; however, a significant number of nodules remain undetected if a low positive rate is desired. This forbids the use of the CAD system in reality [14, 15]. There are variations in shapes, sizes, and types of the nodules, and some are even varied in texture and density. These wide variations are sometimes not diagnosed by the CAD system if the algorithm is not sophisticated enough.

Recently, because of the success of deep learning in numerous applications, CAD systems are also utilizing deep learning [16]. End-to-end deep learning has brought success in many medical image processing applications [17]. The pulmonary nodule detection systems from CT scan images also used several deep learning architectures in recent days [1820]. These systems outperformed the systems using hand-crafted features [21].

As new healthcare is shifting towards smart healthcare, the use of wireless communication and mobile computing has been increasing in a smart healthcare framework. Until today, 3G/4G/5G communication is successfully used [2224]. In [22], a block chain-based security scheme was proposed. An automatic seizure detection system using a mobile framework was proposed in [23]. A deep learning-based network resource algorithm in 5G was proposed in [24]. Now, the paradigm is shifting towards beyond 5G/6G to provide low latency, high transfer rate, and accommodate many sensors [25]. Smart systems are becoming popular in many applications [26, 27]. One of the important aspects of smart healthcare is to have a component of cognitive computing. Cognitive computing can facilitate health monitoring, medicine prescription, and mental state recognition [28]. The emotion of a human can tell a lot about the state of a patient. Therefore, recognizing the correct emotion can help understand the situation of a patient.

In a smart healthcare, a patient’s case can be analysed by multiple doctors from various physical locations. A lung CT scan image can be uploaded to a computer system that can be accessed by several registered doctors. The system can produce an output of correct segmentation of nodules, if any, and provide a decision whether the image belongs to normal, benign, or malignant.

In this paper, we proposed a convolutional neural network- (CNN-) based pulmonary nodule detection and classification system. The classification outputs either one the three classes: normal, low level malignant, and high level malignant. The performance of the proposed system is compared with some of the state-of-the-art related systems.

The paper is structured as follows: Section 2 briefly outlines some of the previous related works. Section 3 describes the proposed system for detecting and classifying pulmonary nodules. Section 4 delivers experimental results and discussions. Section 5 concludes the paper.

Most of the previous works used the Lung Image Database Consortium (LIDC-IDRI) database [29]. In different works, various numbers of samples from the database were used based on a selection criterion. In this section, we mainly focus on the works that used the LIDC database; however, some other important works are also mentioned.

First, we mention the works that used hand-crafted features to detect pulmonary nodules. Wu et al. proposed a nodule classification system using textual and radiological features [30]. They used 13 GLCM textual features and 12 radiological features along with a back-propagation neural network. A total of 2217 CT slices were used, and an area under the receiver operative characteristics (ROC) curve of 0.91 was obtained.

Shape and texture-based features together with a genetic algorithm and a support vector machine (SVM) were proposed to detect nodules in [31]. Before extracting features, the samples were enhanced by a quality threshold and a region growing-based segmentation. 97.5% accuracy was obtained with 140 samples from the LIDC database.

Orozco et al. developed a lung nodule classification system using 19 GLCM features extracted from different subbands of the wavelet transform and the SVM classifier [32]. The accuracy of 82% was obtained using a subset of the LIDC database. Han et al. used 3D GLCM features and the SVM for the nodule classification and got the area under the ROC curve of 92.7% [33]. Phylogenetic diversity index and genetic algorithm-based nodule classification systems were proposed in [34]. A total of 1403 images from the LIDC database were used in the experiments, and 92.5% accuracy was achieved by the system.

Second, we mention the works on pulmonary lung nodule detection and classification using deep learning. Mainly, we focus on the papers from 2018 onwards. A 4-channel CNN-based system was proposed in [35]. In this system, the scan images were enhanced by a Frangi filter, and the learning was based on a multigroup criterion. The LIDC images were used, and a sensitivity of 80.1% was obtained.

A topology-based phylogenetic diversity index on CT scans was used with CNN in [18]. 1404 images from the LIDC database were used in the experiments, and an accuracy of 92.6% was obtained. The images consisted of 394 malignant and 1011 benign nodules. A fusion of classifications using the Adaboost back propagation neural network was used in [36]. Three different types of features were utilized. One set of features was GLCM features, the second set of features was Fourier shape features, and the third set of features was obtained from a CNN architecture. These three sets of features were learned by three neural networks and fused. 1972 sample images (648 malignant and 1323 benign) of the LIDC database were used in the experiments, and an accuracy of 96.7% was achieved.

Xie et al. proposed a nodule detection system using a faster region-based CNN [37]. The 2D convolutional operation was used to reduce false positives. The system achieved 86.4% accuracy using 150414 images. An end-to-end automated lung nodule detection system was developed in [38]. The system had three main phases. The system got 91.4% accuracy with false positives one per scan using 888 CT scans.

From the above review, we found that significant progress in lung nodule detection has been made during the last seven-eight years. The challenges are still there. The challenges include detection and classification of unevenly controlled nodules found on size, shape, and density. Therefore, there is a need for a fully automated system that can overcome some of these challenges.

3. Proposed System

Major CNN architectures such as AlexNet, VGG Net, and Google ResNet were designed to classify natural images that had around 1000 classes. These architectures were trained over millions of images and thereby were designed as very deep models. Medical structured data are not available in plenty, or the data size is limited. So, this limited data can cause overfitting in these architectures. Also, the visualization of medical data may not be meaningful using these very deep models.

3.1. CNN Architecture

In this paper, we developed a CNN architecture that is light (not very deep) and appropriate for medical image processing. The overall architecture is shown in Figure 1. There are four convolutional layers, followed by a global average pooling (GAP), two fully connected (FC) layers, and the softmax output layer, which has three output neurons corresponding to three classes (normal, benign, and malignant). Each convolutional layer has two successive convolutional blocks with rectified linear units (ReLUs), a connector convolutional block with the ReLU, and a max pooling block. In the first layer, the number of filters in each convolutional block is 16, for the second layer 32, for the third layer 48, and, finally, for the last layer 64. The stride of the filters is 1. The output of the connector convolutional block is summed with the output of the second CNN block before the max pooling. The stride of the max pooling is 2; so, the resolution is reduced by a factor of 4. Before each convolutional block, zero padding is applied to maintain the size. Mini batch normalization is applied to each layer to speed up the training. The GAP is used as a purpose of pooling, but it is more efficient than the pooling [39].

The input to the CNN is the image of size . The number of layers in the CNN is four so that the receptor window covers the whole image. We also tested with three layers; however, four layers performed better. Each pixel of the input image is normalized by the mean (mean subtraction) and standard deviation (divided by the standard deviation) of the pixels of the whole database.

The minibatch size was 4 samples, and the cost function was categorical cross-entropy. Before each minibatch, the samples were shuffled to ensure complete randomization of the learning; this also helped to overcome the overfitting. The initial weights were found by applying the normalization [40]. The Adam optimizer was used for optimizing the weights, and the parameters were , and the learning rate was . The proposed CNN architecture is a modified version of the architecture proposed in [41]. The main difference between these two architectures is the number of layers; in our proposed architecture, we have a smaller number of layers, which makes the model a light model.

3.2. Database
3.2.1. Database Selection

The database that was used in the experiments is a publicly available database, named the LIDC-IDRI database [29]. There are 1018 CT scans of 1010 subjects from seven institutions. The slice thickness of the CT scans varied from 0.6 mm to 5.0 mm with a median of 2.0 mm. Four expert radiologists made the annotations of the scans in two separate reading sets. In the first set of readings, each suspicious lesion was classified independently as nonnodule, nodule with a size smaller than 3 mm, and nodule with size greater than or equal to 3 mm. In the second set of reading, 3D segmentation was done for the nodules which are greater than or equal to 3 mm.

3.2.2. Samples’ Selection

The samples were selected in the experiments in the following manner. First, all the scans which had thickness above 3.0 mm were removed. Samples with nodule size less than 3 mm were also removed. Those nodules of size greater than or equal to 3 mm that were agreed by three or four radiologists were retained. The nodules were classified into different stages of malignancy and were ranked from malignancy level 1 to malignancy level 5. Levels 1 and 2 were denoted as benign, and levels 4 and 5 were denoted as malignancy. The samples with malignancy level 3 were not considered to make a clear distinction between benign and malignancy. Overall, there were 1279 samples selected for the experiments, of which 569 nonnodules, 278 benign, and 432 malignant. Figure 2 shows an example of a CT image, where the lung nodule is marked by a red circle. On the right side of the figure, there are ground truths (GTs) and corresponding segmentation as the nodule region of interest (NROI) by four radiologists. From the figure, we see that the radiologists’ segmentations differ for a sample.

Nodule candidate regions are mined slice by slice from the LIDC. The candidate nodules’ pixels retained their original values using a mask and make it a size by padding zero as described in [41]. Eventually, all the samples are resized to .

3.2.3. Data Augmentation

The number of samples was not enough for proper training of the CNN, and also the numbers of the samples per class are unbalanced. Therefore, we need to raise the number of samples and balance the numbers by data augmentation. We applied the augmentation only for the training data. Only rotation and translation operations were used for the augmentation. The samples were rotated with random angles (between 10° and 60°) and translated within a range of [-2, 2].

4. Experimental Results and Discussion

The experiments were done by means of the 10-fold cross-validation approach. As described earlier, we removed level 3 samples to make a clear distinction between benign and malignant samples. In fact, in two sets of experiments, we also included level 3 samples. Therefore, we had three sets: set 1 had samples of level 3 removed, set 2 had samples of level 3 included in the benign category, and set 3 had samples of level 3 included in the malignant category. Set 1 had a total of 1279 samples of which 569 were normal, 278 were benign, and 432 were malignant. Set 2 had a total of 1508 samples, of which 507 were benign and 432 were malignant. Set 3 had 1508 samples of which 278 were benign and 661 were malignant. Figure 3 illustrates the accuracy of the proposed system using three sets. Set 1 had an accuracy of 94.65%, set 2 had 89.21%, and set 3 had 73.4%. From these results, we conclude that the samples of level 3 are more benign than malignant. In the subsequent experiments, we use only set 1.

Figure 4 displays the confusion matrix of the system using set 1. From the matrix, we find that the normal class generally was not confused with benign or malignant. Some benign and malignant samples were confusing between them. Malignant samples were confused the most.

We also found the confusion matrix recall and precision values of the system. The average recall was 98.07%, and the average precision was 98.06. Figure 5 shows the ROC curve of the system. The area under the curve was 0.987, which is considered very good. Figure 6 illustrates the learning curves in terms of accuracy and loss of the system. From the figure, we found that the accuracy and the loss are steady after some iterations. Figure 7 shows some malignant samples which were misclassified as benign samples. The misclassification samples did not have any specific criteria; however, the fading boundaries and size could contribute to such misclassification. We need more investigation into this matter.

Table 1 provides a measure of performance between systems. The proposed system was compared with some recent related systems which used deep learning. All the compared systems used the same LIDC database; however, the number of samples varied. The results of the systems were extracted from the corresponding papers. From the table, we find that the proposed system has got the highest accuracy. The closet accuracy was with the system in [36]. This system used three-streams and fused hand-crafted features with CNN features using an Adaboost neural network. Therefore, the system in [36] is heavily computationally intensive.

The proposed architecture has 275 MFLOPS and approximately 200 K parameters. On the other hand, the AlexNet has around 1.5 GFLOPS and 60 million parameters, and the GoogLeNet has around 3 GFLOPS and 7 million parameters. Therefore, our proposed architecture is very light compared to these famous architectures. All the experiments in this paper were carried out using a quad-core machine with 12 GB RAM and Nvidia GeForce GTX 1050 GPU.

5. Conclusion

The use of mobile computing, cognitive computing, machine learning, and healthcare data analytics greatly influences our life. To this end, a pulmonary nodule detection and classification system using a light CNN model was proposed. The system was evaluated using the LIDC database samples. The system achieved 97.9% accuracy when level 3 of malignancy samples was excluded in the experiments. The average recall and precision values were above 98%. Compared to the other state-of-the-art systems, the proposed system’s performance is high. In a future study, we aim to visualize the nodule boundaries. We also want to fuse the features from different layers of the CNN architecture to enhance accuracy. Another direction is to use active learning to improve the performance [42].

Data Availability

Not applicable.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This study was funded by the Deanship of Scientific Research, Taif University, KSA (Research Project number 1-440-6146).