1 Introduction

The outbreak of novel coronavirus 2019 (COVID-19) has rapidly spread to almost every country in the world [9]. To date (SEP 9th 2020), there have been 27, 775, 319 confirmed cases and 902, 665 death cases all around the world. This epidemic has caused the global health organizations to raise concern for public health due to coronavirus outbreak and to seek viable methods for detecting COVID-19 as soon as possible.

Currently, the most common diagnosis of COVID-19 is still based on real-time reverse-transcriptase polymerase chain reaction (RT-PCR), and is once adopted as golden standard for confirmation of infection. However, the sensitivity if RT-PCR is unsatisfying with the lowest accuracy of only 60% [2, 10, 17]. Therefore, the current solution is to conduct multiple tests in consecutive time to increase the true positive ratio of the test. On the contrary, medical-imaging-based methods such as chest computed tomography (CT) and Chest X-ray (CXR), can be a more reliable way of confirming positive COVID-19 patients.

On the one hand, CXR imaging is inexpensive, swift and universal but unable to detect acute pneumonia in previous studies [4, 6, 28]. On the other hand, CT imaging is now clinically adoptable as the principal way to confirm positive or suspected-positive COVID-19 cases [11]. Early studies [12] have analyzed and compared both the effectiveness and limitation of both methods in the diagnosis and management of COVID-19 patients. [13] has analyzed 1099 confirmed COVID-19 inpatients and outpatients in the hospital. During the CT scan when the patients are admitted to the hospital, 86.2% showed abnormal symptoms. The most common symptoms of COVID-19 patients from CT image are ground glass opacity (56.4%) and bilateral patchy shadows (51.8%). The study manifested that a patient with pneumonia due to COVID-19 may show lung abnormalities on chest CT, yet the initial RT-PCR test is not necessary positive. Chest CT achieves higher sensitivity for diagnosis of COVID-19 as compared with initial RT-PCR from pharyngeal swab samples. It has been reported that the sensitivity of chest CT in suggesting COVID-19 is 97% [2].

Nevertheless, non-uniformly results by CT and RT-PCR results are commonly observed, especially on patients recovered from the virus inflection, who are shown to have negative RT-PCR results. To alleviate the problem, CT examination is shown to be necessary for diagnosis. According to the latest guideline of Diagnosis and Treatment of Pneumonitis Caused by 2019-nCoV (trial 7th version) published by the Chinese government [25], if the imaging features are discovered by CT during an early stage, it can be confirmed as a clinically diagnosed case. Besides, recent studies [3, 7, 26] also rely on the chest CT to develop accurate and efficient method for diagnosing COVID-19.

Most of the recent works have employed deep learning models identify and localize the areas infected by COVID-19 [1, 5, 8, 15, 20, 22,23,24, 30]. Among them, Wang et al. [29] proposed a method based on the inception transfer learning to detect the patients with COVID-19, which extracts COVID-19’s graphical features and the external testing data accuracy is \(79.3\%\). The team collected 1065 CT images, including 325 images with COVID-19 and 740 images with the other typical viral pneumonia. Fu et al. [12] have trained the model on ResNet network, which obtains the accuracy of \(98.8\%\). The study collected five types of images, including COVID-19 pneumonia (50 cases), non-COVID-19 viral pneumonia (52 cases), bacterial pneumonia (53 cases), pulmonary tuberculosis (54 cases), and normal lung (100 cases). Zheng et al. [34] developed a detection for COVID-19 deep learning approach with weak label, which utilizes 3D CT volumes to predict the probability of COVID-19 infectious and the commendable performance benefits from the 3D deep neural network. The authors have trained 499 CT volumes and the accuracy is \(84\%\). Ying et al. [33] first extracted the main regions of lungs and then designed a Details Relation Extraction neural network (DER-Net) to train the model. Finally, the team developed a deep learning-based CT diagnosis system with the detection time of 30s. 88 chest CT scans patients with COVID-19, 101 patients infected with bacteria pneumonia and 86 healthy persons are collected to finish the classification task. Jin et al. [16] presented an artificial intelligence (AI) method via CT imaging for assisting the screening of COVID-19. This research reduced the burden of radiologists and implemented a model for segmentation and classification simultaneously. Li et al. [21] developed an framework to detect patients with COVID-19. Their works have employed the RT-PCR as the ground truth even if the rule is unreasonable with lower accuracy. More recently, Shi et al. [27] reviewed the artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19. The study enrolled \(44-1658\) patients with COVID-19 and \(55-1551\) patients, suffering from other diseases, including bacterial pneumonia, viral pneumonia, Influenza-A, community-acquired pneumonia, non-pneumonia and Normal. Finally, their study reported the classification accuracies by several popular deep learning models.

On the whole, the aforementioned methods suffer from the following deficiencies: 1) The training dataset consists of a small sample size. For example, there are only 275 scanned positive with COVID-19 used in [32]. Although data augmentation techniques are adopted, the training model is inclined to be over-fitting, which in term will influence the capability of generalization; 2) Unbalanced training samples will bias the classifier to the category with significantly more samples; 3) The time required for diagnosis is often lengthy.

To facilitate online testing of COVID-19 symptoms through CT images, this paper proposes a deep learning method to classify and localize areas infected by coronavirus in chest CT images. The main contributions of the proposed method are as follows: 1) No data augmentation is required to accurately discriminate the COVID-19 from normal controls with the specificity of 0.92 and sensitivity of 0.93; 2) The raw DICOM image is not necessary for testing. A highly compressed image like Jpeg can be used to allow a quick diagnosis; 3) It discriminates against the virus infection within 6s and thus allows an online test with the lightest cost. Furthermore, we applied our model on 48 asymptomatic patients diagnosed with COVID-19; 4) On a large patient cohort, the proposed method achieves substantial performance.

The rest of the paper is organized as follows: in Sect. 2, the material and method of this study is introduced in detail; in Sect. 3, the classification performances of our model on CT images is evaluated; Conclusion is drawn in Sect. 4.

Fig. 1
figure 1

Illustrative examples on different image types, manifestoed in CT

2 Materials and Methods

2.1 Training Data Acquisition on COVID-19 Patients and Non-COVID-19

The COVID-19 CT dataset utilized in this study are from Huangpi Hospital of Traditional Chinese Medicine, Wuhan, China. The dataset contains both medical imaging data and clinical data. Medical imaging data are presented as lung window CT of inhospitable patients. This retrospective study has been approved by the ethics committee of First Affiliated Hospital of Guangxi Medical University and the requirement for written informed consent was waived (Approval No. 2020 (KY-E-082)).

The dataset contains chest CT exams and clinical laboratory information from 1091 patients with COVID-19 between Jan 27, 2020 to Mar 21, 2020. The clinical symptoms involved temperature and blood pressure, fever, dry cough, fatigue, headache, chest distress, pharyngeal pain, etc. All laboratory results (including routine blood tests, high-sensitivity C-reactive protein measurement, antibody, etc.) were also recorded. All the patients were confirmed by RT-PCR. Characteristics of the included 1082 patients (exclude 9 patients of unknown age and gender) are indicated in Table 1.

All chest CT examinations were conducted on GE BRIVO CT 325 or Siemens SOMATOM Definition Flash CT Scanners. The statistics on the collected CT images on different types are summarized in Fig. 1. Each volumetric exam contains about \(50-400\) CT slices with a varying slice-thickness from 1.3mm to 3mm. The reconstruction matrix of CT image is \(512\times 512\) pixels with in-plane pixel spatial resolution of \(1.5 \times 1.5{\text{mm}}^{2}\). Every patient who is infected or suspected of being infected with covid-19 will undergo one or more lung CT scans during the course of the hospitalization. The average number of lung CT scans for each patient is 2.1, with a range from 1 to 5. In our experiment, we considered the CT images for each patient, taken in different time, as independent samples, because they independently represented the image characteristics a affected by COVID-19, as shown in Fig. 2. The training and testing dataset are divided disjointedly. Therefore, the CT image with COVID-19 in force is 2267. Figure 3 demonstrates the examples of three cases with COVID-19 scanning CT in three different periods (two days interval).

Fig. 2
figure 2

Different stages of COVID-19

Fig. 3
figure 3

Three cases diagnosed with COVID-19. The figure intuitively indicates the change of the lesions of patients over time

2.2 Testing Data Collection on COVID-19 Patients and Non-COVID-19

To verify the robustness of the model, two independent datasets are collected and tested. The first dataset is reported and analyzed in [32], which contains 275 positive cases and 195 normal cases. The second dataset is collected from the radiology department in the First Affiliated Hospital of Guangxi Medical University hospitals, Guangxi, China. The chest CT examinations were conducted on the GE LightSpeed VCT Scanner, GE’s Revolution CT 256-Slice Scanner, Siemens SOMATOM Definition Flash CT Scanner or Siemens SOMATOM Perspective CT Scanners. The scanning parameters are as follows: the reconstruction matrix of lung window, \(1001 \times 762\); slice-thickness, 1.3mm-2mm. It consists of 40 normal samples in the CT examination physical examination for COVID-19.

The final dataset consists of 1357 (\(52.4\%\)) exams for positive samples and 1235 (\(47.6\%\)) negative samples (i.e. Non-COVID-19).

Table 1 Statistics on the collected dataset on 1082 COVID-19 patients

2.3 Classification COVID-19 Patients via Deep Learning Model

2.3.1 Image Preprocessing

Preprocessing is important to lower both the intensive computational cost of deep classification model during training and testing phase, as well as to facilitate realtime online diagnosis process. First, the raw DICOM images are compressed from 300 to 60 M, which drastically reduce the input size of the model. Second, we sorted the slices of each patient from top to bottom according to the serial number, and randomly selected 6 images from COVID-19 patients while 12 images from Non-COVID patient. Those slice covering the lung area are selected. Finally, all data is normalized to remove the variations caused by the multi-sourced collection. Fourthly, while ensuring the integrity of the lungs, we cropped each image up and down by 50 pixels to reduce the impact of the background. The reason why we do not perform lung segmentation on the selected images is to avoid removing the lesion area at the lung boundaries. Finally, we resized the image to \(299\times 299\) as input for the base-model which is ResNet50.

2.3.2 Deep Learning Model

Our model is developed on the Keras platform based on ResNet50 architecture. To alleviate the tendency of over-fitting, to speed up the convergence and to improve the robustness of the model, the model is pre-trained on the ImageNet dataset and then finetuned in our dataset. The proposed model contains a feature extractor and a classifier. In particular, the structure of the feature extractor remains unchanged. The classifier is supplemented by an additional convolutional layer to reducing the output category to match our classes (COVID-19 and non-COVID-19). The network is retrained on imagenet for assuring a promising performance. Stochastic gradient descent (SGD) optimizer is employed for the optimizer and its learning rate is set at 0.001. To avoid over-fitting, we employed a dropout after the fully-connected layer. Moreover, the training is early stopped after 50 epochs. The architecture of the proposed model is exhibited in Fig. 4.

Fig. 4
figure 4

Architecture of the proposed model

3 Experiments

Once the deep learning model is trained, new test samples can be processed quickly. The average processing time for each CT slice is 9.857 milliseconds on a workstation (GPU NVIDIA GeForce GTX 1070 8 GB, RAM 48 GB, and Intel Core (TM) i7-7700 CPU 3.6GHz).

3.1 Experimental Setting and Performance Evaluation

The performance of the model is evaluated via training set, validation set and testing set. Table 2 indicates the distribution of the dataset. The distribution of the study population is demonstrated in Fig. 5.

Fig. 5
figure 5

Schematic diagram of the distribution of patients

Table 2 Summary of training, validation and testing datasets

Five widely used metrics, including the accuracy (ACC), sensitivity (SENS), specificity (SPE), intersection over union (IoU), the area under the receiver operating characteristic curve (AUC) and F1-score (F1), were employed to measure the performance of the method. These qualitative metrics were defined as follows:

$$\begin{aligned} ACC = & {} \frac{TP+TN}{TP + FP + FN +TN} \end{aligned}$$
(1)
$$\begin{aligned} SPE = & {} \frac{TN}{TN + FP}\end{aligned}$$
(2)
$$\begin{aligned} SENS = & {} \frac{TP}{TP + FN}\end{aligned}$$
(3)
$$\begin{aligned} IoU = & {} \frac{TP}{TP+FP+FN}\end{aligned}$$
(4)
$$\begin{aligned} F1 = & {} \frac{2PR}{P+R}\end{aligned}$$
(5)
$$\begin{aligned} A U C = & {} \frac{\sum _{i \in p o s i t i v e c l a s s} r a n k_{i}-\frac{M(1+M)}{2}}{M \times N} \end{aligned}$$
(6)

where TP, FP, TN, FN, P, R M and N represented true positive, false positive, true negative and false negative, precision, recall, the number of positive and negative samples, respectively. We also plotted the receiver operating characteristic curve (ROC) for each method. Then estimate the area under the ROC curve (AUC).

3.2 Classification of COVID-19 from Normal Controls

The experimental numerical results are illustrated in Table 3. The receiver operating characteristic (ROC) curve is plotted in Fig. 6. Four examples of the localization of the lesions are illustrated in Fig. 7.

Fig. 6
figure 6

COVID-19 and Non-COVID-19 detection results evaluated via the receiver operating characteristic (ROC) curve

Fig. 7
figure 7

Four examples of the localization of the lesions for the COVID-19 patients

Table 3 The quantitative results in classification of COVID-19 and Non-COVID-19

3.3 Recognition of the Asymptomatic Patients Confirmed as COVID-19

It has been reported that there are a rising number of asymptomatic patients confirmed with COVID-19 [18, 31]. According to the latest guideline, asymptomatic infection refers to those who have no relevant clinical symptoms (such as fever, cough, sore throat, etc.), yet are positive for RT-PCR or antibody tests in specimens such as the respiratory tract. Asymptomatic infections are not diagnosed patients, because they are not new infections. Asymptomatic people can also become a source of infection, with a certain risk of transmission.

The presence of unilateral ground-glass opacities in asymptomatic patients with COVID-19 pneumonia, in other words, asymptomatic patients probably exit CT changes before symptom onset [19]. This finding suggests that CT is a sensitive modality with which to detect COVID-19 pneumonia, even in asymptomatic. Therefore, CT images of COVID-19 asymptomatic patients have certain imaging characteristics, and chest CT scans have important diagnostic value in the close contact with infected people.

In some patients, it may be negative for the RT-PCR, whereas early on the CT tomography typical lung consolidation [14]. Consequently, a comparative experiment about the detection accuracy between RT-PCR and CT imaging is performed. Experimental summary of asymptomatic patients with COVID-19 as shown in Table. 4.

Table 4 Summary of asymptomatic patients with COVID-19

4 Conclusion and Discussion

This paper proposes a classification and localization method for CoronaVirus infected areas on a large patient cohort. The proposed method has the following vital advantages compared to the previous related works. (1) No data augmentation is needed to accurately discriminate the COVID-19 from normal controls with the specificity of 0.92 and sensitivity of 0.93; (2) The raw DICOM image is compressed to allow a quicker diagnosis procedure and online testing; (3) On 48 asymptomatic patients diagnosed with COVID-19, our method has shown substantial performance.

Extensive experiments indicate that: (1) The positive rates of RT-PCR assay in our collected patients are 63.5% (687/1082); (2) With the diagnosis results of clinical indicators, The joint detection of RT-PCR and CT as ground truth, the classification results for COVID-19 and Non-COVID-19 that the accuracy, sensitivity, specificity, IoU, F1-score and AUC are 93%, 93%, 92%, 85%, 92% and 93% respectively; (3) For 48 asymptomatic patients, 45.8% (22/48) of the RT-PCR assay is negative in early diagnosis, and the accuracy of CT scans is 95.8% (46/48).

In our further work, we aim to detect asymptomatic patients. Statistical analysis of clinical indicators of asymptomatic patients, combined with imaging characteristics to find the differences between such patients and symptomatic patients.