Development of a deep learning-based image eligibility verification system for detecting and filtering out ineligible fundus images: A multicentre study

https://doi.org/10.1016/j.ijmedinf.2020.104363Get rights and content

Highlights

  • DLIEVS can accurately detect both poor-quality and poor-location fundus images.

  • DLIEVS can immediately notify photographer if ineligible images were produced.

  • DLIEVS has a potential to reduce negative impacts caused by ineligible images.

  • DLIEVS can serve as a pre-screening technique for fundus image-based AI systems.

Abstract

Background

Recent advances in artificial intelligence (AI) have shown great promise in detecting some diseases based on medical images. Most studies developed AI diagnostic systems only using eligible images. However, in real-world settings, ineligible images (including poor-quality and poor-location images) that can compromise downstream analysis are inevitable, leading to uncertainty about the performance of these AI systems. This study aims to develop a deep learning-based image eligibility verification system (DLIEVS) for detecting and filtering out ineligible fundus images.

Methods

A total of 18,031 fundus images (9,188 subjects) collected from 4 clinical centres were used to develop and evaluate the DLIEVS for detecting eligible, poor-location, and poor-quality fundus images. Four deep learning algorithms (AlexNet, DenseNet121, Inception V3, and ResNet50) were leveraged to train models to obtain the best model for the DLIEVS. The performance of the DLIEVS was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, as compared with a reference standard determined by retina experts.

Results

In the internal test dataset, the best algorithm (DenseNet121) achieved AUCs of 1.000, 0.999, and 1.000 for the classification of eligible, poor-location, and poor-quality images, respectively. In the external test datasets, the AUCs of the best algorithm (DenseNet121) for detecting eligible, poor-location, and poor-quality images were ranged from 0.999–1.000, 0.997–1.000, and 0.997–0.999, respectively.

Conclusions

Our DLIEVS can accurately discriminate poor-quality and poor-location images from eligible images. This system has the potential to serve as a pre-screening technique to filter out ineligible images obtained from real-world settings, ensuring only eligible images will be applied in the subsequent image-based AI diagnostic analyses.

Introduction

Deep learning has revolutionized the field of medical artificial intelligence (AI) for its excellent diagnostic performance in detecting various diseases [1,2]. For example, based on deep learning, glaucomatous optic neuropathy can be automatically identified from fundus images with an area under the receiver operating characteristic curve (AUC) of 0.986 [3], and patients with COVID-19 pneumonia can be accurately identified from patients with other common types of pneumonia and normal controls using computed tomography images with an AUC of 0.971 [4].

Despite the performance of image-based deep learning algorithms in detecting diseases is ideal in laboratory settings, their real-world performance is uncertain because most studies develop and evaluate deep learning models using only eligible images [[3], [4], [5], [6], [7]]. In real-world settings, multiple factors can cause the generation of ineligible images including poor-quality and poor-location images. In ophthalmology, poor-quality fundus images can be resulted from operator errors, patient noncompliance, hardware imperfections, and obscured optical media [8,9]. Poor-location fundus images can be caused by patients’ poor target fixation during fundus imaging, such as the image without showing the optic disk in a glaucoma screening programme [3]. Ineligible fundus images will potentially cause diagnostic information loss and subsequently have a negative impact on downstream analysis [10,11]. To improve diagnostic accuracy, an approach that can detect and filter out ineligible images is essential to ensure that the subsequent analyses are based on eligible images. Manual ineligible image screening often requires experts, and this procedure is time-consuming and labor-intensive, especially in large-scale applications. In addition, it is not appropriate that automated AI diagnostic systems applied in the real world need human experts to manually check the eligibility of images first. Therefore, an automated image eligibility verification method appears crucial.

Several studies have developed deep learning models to identify poor-quality fundus images corresponding to specific diseases [[12], [13], [14]]. However, the generalizability of these models is limited because the image quality standards are disease-specific. For example, for glaucoma screening, a poor-quality image is defined when vessels within the optic disk region cannot be discerned [3], while for identifying age-related macular degeneration (AMD), a poor-quality image is defined if vessels within the macular region cannot be identified or over 50 % of the macular region is obscured [12]. Briefly, the image quality assessment model built for one specific disease may not be adapted to another. In addition, it is worth noting that all previous studies mainly focused on an automated approach to detect poor-quality images. Yet an automated approach to identify poor-location images which also negatively affect the subsequent image analyses is not well investigated. Therefore, the development of a versatile automated image eligibility verification method that can detect and filter out both poor-quality and poor-location fundus images is more practical for the demands in real-world settings, as AI-based fundus imaging will ultimately be employed to screen for multiple fundus diseases simultaneously [15].

The purpose of this study is to develop a deep learning-based image eligibility verification system (DLIEVS) to detect and filter out ineligible fundus images including both poor-quality and poor-location images to enable that the subsequent image analyses can be based on eligible images. In addition, we evaluated the effectiveness and generalizability of this system using images from different types of fundus cameras in 4 different institutions.

The rest of the paper is organized as follows: Section 2 introduces the databases and methods proposed in this study; Sections 3 and 4 present the performance analyses and clinical value of the proposed methods, respectively; finally, Section 5 provides conclusions and future directions.

Section snippets

Fundus image datasets

A total of 18,031 fundus images from 4 hospitals were used to develop and evaluate the DLIEVS. These data were consecutively obtained between January 2018 and January 2020 using 4 different types of non-mydriatic fundus cameras and included subjects who underwent routine ophthalmic health evaluations, ophthalmology consultations, and fundus lesion examinations. The subjects were examined without mydriasis. All images were deidentified before they were transferred to research investigators. This

Characteristics of the datasets

In total, 18,031 images (1,899 poor-quality images, 1,506 poor-location images, and 14,626 eligible images) from 9,188 subjects were used to develop and evaluate the DLIEVS. Detailed information of the datasets from NEH, NOC, PCH, and SCH is summarized in Table 2.

Classification performance of deep learning algorithms in the internal test dataset

Four deep learning algorithms (AlexNet, DenseNet121, Inception V3, and ResNet50) were used to train models for image eligibility verification. Their performance in the internal test dataset is shown in Fig. 3, which illustrates that

Discussion

This study was to establish a DLIEVS to detect ineligible fundus images (poor-quality and poor-location images) and evaluate the DLIEVS in four institutions using different commercially available fundus cameras. Our primary finding was that the DLIEVS could discriminate among eligible images, poor-location images, and poor-quality images with high accuracy (Table 3). In the external test datasets, the AUCs of the DLIEVS based on the best algorithm DenseNet121 for detecting eligible images,

Conclusions and future directions

Our DLIEVS had high sensitivity and specificity for discriminating poor-quality and poor-location images from eligible images in both internal and external test datasets. The algorithm DenseNet121 performed better than the other three algorithms (AlexNet, Inception V3, and ResNet50) in detecting ineligible images (all AUCs ≥ 0.997). We believe that this system can function as a pre-screening technique for AI diagnostic systems in real-world clinical settings to ensure only eligible images will

Author contributions

Conception and design: Z.L., J.J., H.Z., and W.C. Funding obtainment: W.C. Provision of study data: W.C., H.Z., and H.W. Collection and assembly of data: Z.L., Q.Z., K.C., X.L., H.W., and H.Z. Data analysis and interpretation: Z.L., J.J., H.Z., and W.C. Manuscript writing: all authors. Final approval of the manuscript: all authors.

Author statement

None.

Summary table

What was already known on the topic

  • Most high-quality medical AI studies excluded ineligible clinical images from real-world data when developing and evaluating their AI diagnostic systems.

  • Ineligible images that had a high potential to compromise downstream analysis were inevitable in clinical practice due to various factors, such as dirty camera lenses, head/body movement, operator errors, and patient noncompliance.

What this study added to our knowledge

  • We developed a robust

Declaration of Competing Interest

The authors declare that they have no competing interests that could inappropriately influence (bias) their work.

Acknowledgments

This study received funding from the National Key R&D Programme of China (grant no. 2019YFC0840708), and the National Natural Science Foundation of China (grant no. 81970770). The funding organizations played no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

References (31)

  • T.J. Bennett et al.

    Ophthalmic imaging today: an ophthalmic photographer’s viewpoint - a review

    Clin. Exp. Ophthalmol.

    (2009)
  • E. Ataer-Cansizoglu et al.

    Computer-Based image analysis for plus disease diagnosis in retinopathy of prematurity: performance of the “i-ROP” system and image features associated with expert diagnosis

    Transl. Vis. Sci. Technol.

    (2015)
  • F. Shao et al.

    Automated quality assessment of fundus images via analysis of illumination, naturalness and structure

    IEEE Access

    (2018)
  • S. Keel et al.

    Development and validation of a deep-learning algorithm for the detection of neovascular age-related macular degeneration from colour fundus photographs

    Clin. Exp. Ophthalmol.

    (2019)
  • D. Ting et al.

    Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes

    JAMA

    (2017)
  • Cited by (9)

    • Comparison of deep learning systems and cornea specialists in detecting corneal diseases from low-quality images

      2021, iScience
      Citation Excerpt :

      Recently deep learning has attained remarkable performance in disease screening and diagnosis (Cheung et al., 2021; Hosny and Aerts, 2019; Li et al., 2020a, 2020b, 2020c, 2020d; Matheny et al., 2019; Zhou et al., 2021). The performance of deep learning is comparable with and even superior to that of human doctors in many clinical image analyses (Li et al., 2021a, 2021b, 2021c, 2021d; Li et al., 2020a, 2020b, 2020c, 2020d; Li et al., 2019; Ting et al., 2017; Xie et al., 2020; Zhang et al., 2020). For example, the accuracy of a deep learning system in distinguishing coronavirus pneumonia from computed tomography images reached the level of senior radiologists (87.5% versus 84.5%; p > 0.05) and exceeded the level of junior radiologists (87.5% versus 65.6%; p < .05) (Zhang et al., 2020).

    View all citing articles on Scopus
    1

    These authors contributed equally as first authors.

    View full text