Chest x-ray automated triage: A semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures

https://doi.org/10.1016/j.cmpb.2021.106130Get rights and content

Highlights

  • Detection of four main radiological findings in chest x-rays with a semiologic approach.

  • Images with different types of labels were exploited by using a late fusion of four convolutional architectures.

  • Trained on heterogeneous data from a combination of public and institutional datasets.

  • Achieved an area under the curve of 0.87 in detection of abnormality in local retrospective collection of chest x-rays.

  • Designed as a clinically useful tool that could be successfully integrated into a hospital workflow

Abstract

Background and objectives

The multiple chest x-ray datasets released in the last years have ground-truth labels intended for different computer vision tasks, suggesting that performance in automated chest x-ray interpretation might improve by using a method that can exploit diverse types of annotations. This work presents a Deep Learning method based on the late fusion of different convolutional architectures, that allows training with heterogeneous data with a simple implementation, and evaluates its performance on independent test data. We focused on obtaining a clinically useful tool that could be successfully integrated into a hospital workflow.

Materials and methods

Based on expert opinion, we selected four target chest x-ray findings, namely lung opacities, fractures, pneumothorax and pleural effusion. For each finding we defined the most suitable type of ground-truth label, and built four training datasets combining images from public chest x-ray datasets and our institutional archive. We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool. The performance was measured on two test datasets: an external openly-available dataset, and a retrospective institutional dataset, to estimate performance on the local population.

Results

The external and local test sets had 4376 and 1064 images, respectively, for which the model showed an area under the Receiver Operating Characteristics curve of 0.75 (95%CI: 0.74–0.76) and 0.87 (95%CI: 0.86–0.89) in the detection of abnormal chest x-rays. For the local population, a sensitivity of 86% (95%CI: 84–90), and a specificity of 88% (95%CI: 86–90) were obtained, with no significant differences between demographic subgroups. We present examples of heatmaps to show the accomplished level of interpretability, examining true and false positives.

Conclusion

This study presents a new approach for exploiting heterogeneous labels from different chest x-ray datasets, by choosing Deep Learning architectures according to the radiological characteristics of each pathological finding. We estimated the tool's performance on the local population, obtaining results comparable to state-of-the-art metrics. We believe this approach is closer to the actual reading process of chest x-rays by professionals, and therefore more likely to be successful in a real clinical setting.

Introduction

The chest radiography (CXR) is one of the most commonly performed and well-established imaging modalities, playing an essential role in diagnosis and monitoring, both in primary and specialty care [1]. Simultaneously, the increasing clinical demand on radiology departments worldwide is challenging current service delivery models, particularly in publicly funded health care systems. It may not be feasible to report all acquired radiographs promptly in some settings, leading to large backlogs of unreported studies [2,3]. Alternative models of care should be explored, particularly for chest radiographs, which account for 40% of all diagnostic images worldwide [4]. Automated CXR interpretation could improve workflow prioritization in radiology departments and serve as clinical decision support for non-imaging medical specialists, while opening the path for screening initiatives at the population scale.

Computer vision is the interdisciplinary scientific field that seeks to automate the understanding of images by performing tasks that the human visual system can do [5]. The success of Deep Learning (DL) algorithms for computer vision tasks, mainly with convolutional neural networks (CNNs), has led to a rapid adoption of these techniques in medical imaging research, boosting academic works that apply DL to diagnosis tasks, and promoting the building of large labeled medical imaging datasets that are needed to train these algorithms.

In the field of computer vision, different types of ground-truth labels are used to address different supervised tasks: classification CNNs are fed with image-level labels, object detection CNNs with bounding-boxes labels, and segmentation CNNs with pixel-level masks. Most large CXR datasets [6–8] have disease annotations in the form of positive or negative labels indicating the presence or absence of 14 findings that can appear in a CXR. Another recently released dataset organizes labels as a hierarchical family of diseases [9]. Following the release of these large CXR datasets, the use of DL for automated classification of CXR has been widely explored by the scientific community. Due to the type of labels, most research applying CNNs for CXRs use classification architectures [6,7,[10], [11], [12]–14], such as ResNet [15] or DenseNet [16]. However, the translation of DL tools for automated CXR interpretation to real clinical scenarios faces many challenges, as it is still poorly achieved in practice [17]. We believe this can be related to two issues, which motivated this work. These two problems are the multi-pathological detection approach, and the lack of large datasets with localization labels, explained below.

In the first place, the difficulties that DL faces for CXR interpretation could be related to the multi-pathological approach of previous works. The use of DL in health has shown better results when applied to narrow tasks [18], raising the concern that the detection of multiple diseases with the same underlying CNN model might be ineffective. In clinical radiology, CXR is considered a screening tool rather than a differential diagnosis tool [19]. It orients the diagnosis process and the choice of further medical studies, as a CXR study is usually insufficient to identify a specific disease with certainty. It remains one of the most complex imaging studies to interpret, being subject to significant inter-reader variability and suboptimal sensitivity for critical clinical findings. Numerous pathologies are visually similar, leading to considerable variability between CXR reports, even among expert radiologists [20]. This reinforces the idea that multi-pathological CXR classification seems clinically inappropriate. We hypothesize that, to tackle this issue, image interpretation should emulate the reading process of CXR by radiologists, recognizing radiological patterns and signs rather than individualizing multiple diseases [21]. In medicine, the study of signs is called semiology or semiotics. Radiological semiology refers to the description of certain imaging signs that can be observed and interpreted by an expert, and plays a fundamental role in diagnostic imaging [22,23]. In this regard, we propose a semiology-based detection, replacing the traditional 14-findings classification.

In the second place, image-level labels have no information on the localization of the finding in the image. As most labels are assigned by automatic text mining of radiological reports (with no professional revision), they are subject to labeling errors. This might represent an unreliable ground-truth. Motivated by these limitations, some CXR datasets have recently been released containing more robust labels, such as bounding-boxes around pathological findings or pixel-level masks as regions of interest [[24], [25]–26]. This enables the use of CNN architectures for object detection or segmentation. To exploit both the localization information of strongly labeled datasets and the size of larger class-labeled datasets, we need algorithms that can be trained using all available labels from these heterogeneous CXR datasets. The generalized adoption of classification architectures for the detection of diseases in CXR seems limited in this matter. Our hypothesis in this regard is that models that can combine labels for multiple computer vision tasks could improve the detection of findings.

This work explores a solution that takes advantage of heterogeneous ground-truth labels from different CXR datasets. We present a simple approach that combines DL architectures for various computer vision tasks (image classification, object detection, and segmentation) as a late fusion of models, and provides a unified heatmap as an easily interpretable output for clinicians. The combined model will be referred to as TRx (named after the Spanish acronym for thoracic x-rays). We sought to develop a clinically-appropriate tool for computer-aided detection and triage of CXR findings, that recognizes the main radiological patterns relevant in evaluating CXRs, rather than differential diseases. The objectives of the present study were: (1) to report the model development, (2) to measure model performance in the local population.

Section snippets

Study design

We conducted a retrospective study to develop an AI system for automated triage of CXRs and validate its performance in the local population. We followed the CLAIM guideline (Checklist for Artificial Intelligence in Medical Imaging) [27] under the exploratory study category for model creation addressing an AI application for image classification. This checklist was modeled after the STARD guideline and has been extended to address applications of AI in medical imaging, in concordance with the

Datasets

The four training datasets built for this study are summarized in Table 2. Relabeling was needed in three cases: we performed a grouping of classes in the case of lung opacities, bounding-box annotation in the case of fractures, and segmentation in the case of pleural effusion.

The two test datasets are described in Table 3. The external test set contains 4376 frontal CXRs from 1695 patients, combining the validation and test sets released by Majkowska et al. [13]. The local test set contains

Discussion

This study presents a new approach for exploiting heterogeneous labels from different CXR datasets, by choosing CNN architectures according to each pathological finding's semiology. The objective is not to build a tool for differential diagnosis but to detect imaging signs and patterns that can appear in a CXR. We believe this approach is closer to the actual reading process of CXRs by professionals, and therefore more likely to be successful in a real clinical setting.

This study's contribution

Conclusion

The semiologic approach and the late fusion strategy introduced in this work showed promising results in both external and local populations. Further exploration of the use of heterogeneous labels in CXR interpretation is needed, while working with clinically meaningful definitions, in order to take full advantage of available annotated imagesets. Regarding TRx, the next step will be a clinical validation on a prospective set of images, with a careful assignment of ground-truth.

Declaration of Competing Interest

The authors of the article entitled “Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures” declare no conflicts of interest.

Acknowledgments

We thank the imaging professionals from the Radiology Department at Hospital Italiano de Buenos Aires for their contribution with expert opinion. Additionally, we acknowledge the collaboration of all team members participating in the Program for Artificial Intelligence in Health at this hospital.

This work was supported by the annual research grant provided by Hospital Italiano de Buenos Aires. The Titan V used for this research was donated by the NVIDIA Corporation.

References (49)

  • A.E.W. Johnson, T.J. Pollard, N.R. Greenbaum, M.P. Lungren, C. Deng, Y. Peng, Z. Lu, R.G. Mark, S.J. Berkowitz, and S....
  • L. Yao, E. Poblenz, D. Dagunts, B. Covington, D. Bernard, and K. Lyman. Learning to diagnose from scratch by exploiting...
  • P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, et al....
  • A.G. Taylor et al.

    Automated detection of moderate and large pneumothorax on frontal chest x-rays using deep convolutional neural networks: a retrospective study

    PLoS Med.

    (2018)
  • A. Majkowska et al.

    Chest radiograph interpretation with deep learning models: assessment with radiologist-adjudicated reference standards and population-adjusted evaluation

    Radiology

    (2020)
  • M. Annarumma et al.

    Automated triaging of adult chest radiographs with deep artificial neural networks

    Radiology

    (2019)
  • K. He et al.

    Deep residual learning for image recognition

  • G. Huang et al.

    Densely connected convolutional networks

  • B.A. Jr et al.

    A road map for translational research on artificial intelligence in medical imaging: from the 2018 national institutes of health/RSNA/ACR/the academy workshop

    J. Am. Coll. Radiol.

    (2019)
  • B. Kelly

    The chest radiograph

    The Ulster Med. J.

    (2012)
  • M.I. Neuman et al.

    Variability in the inter-pretation of chest radiographs for the diagnosis of pneumonia in children

    J. Hospital Med.

    (2012)
  • S.M. Ellis et al.

    The WHO manual of diagnostic imaging: radiographic anatomy and interpretation of the chest and the pulmonary system

    World Health Organ.

    (2006)
  • E.G. Nordio. Radiological methods and bases of radiological semiotics....
  • F. Schiavon et al.

    Radiological semiotics in the report

    Radiol. Report. Clin. Pract.

    (2008)
  • Cited by (0)

    View full text