Development of a deep learning-based image eligibility verification system for detecting and filtering out ineligible fundus images: A multicentre study

doi:10.1016/j.ijmedinf.2020.104363

International Journal of Medical Informatics

Volume 147, March 2021, 104363

https://doi.org/10.1016/j.ijmedinf.2020.104363 Get rights and content

Highlights

•
DLIEVS can accurately detect both poor-quality and poor-location fundus images.
•
DLIEVS can immediately notify photographer if ineligible images were produced.
•
DLIEVS has a potential to reduce negative impacts caused by ineligible images.
•
DLIEVS can serve as a pre-screening technique for fundus image-based AI systems.

Abstract

Background

Recent advances in artificial intelligence (AI) have shown great promise in detecting some diseases based on medical images. Most studies developed AI diagnostic systems only using eligible images. However, in real-world settings, ineligible images (including poor-quality and poor-location images) that can compromise downstream analysis are inevitable, leading to uncertainty about the performance of these AI systems. This study aims to develop a deep learning-based image eligibility verification system (DLIEVS) for detecting and filtering out ineligible fundus images.

Methods

A total of 18,031 fundus images (9,188 subjects) collected from 4 clinical centres were used to develop and evaluate the DLIEVS for detecting eligible, poor-location, and poor-quality fundus images. Four deep learning algorithms (AlexNet, DenseNet121, Inception V3, and ResNet50) were leveraged to train models to obtain the best model for the DLIEVS. The performance of the DLIEVS was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, as compared with a reference standard determined by retina experts.

Results

In the internal test dataset, the best algorithm (DenseNet121) achieved AUCs of 1.000, 0.999, and 1.000 for the classification of eligible, poor-location, and poor-quality images, respectively. In the external test datasets, the AUCs of the best algorithm (DenseNet121) for detecting eligible, poor-location, and poor-quality images were ranged from 0.999–1.000, 0.997–1.000, and 0.997–0.999, respectively.

Conclusions

Our DLIEVS can accurately discriminate poor-quality and poor-location images from eligible images. This system has the potential to serve as a pre-screening technique to filter out ineligible images obtained from real-world settings, ensuring only eligible images will be applied in the subsequent image-based AI diagnostic analyses.

Introduction

Deep learning has revolutionized the field of medical artificial intelligence (AI) for its excellent diagnostic performance in detecting various diseases [1,2]. For example, based on deep learning, glaucomatous optic neuropathy can be automatically identified from fundus images with an area under the receiver operating characteristic curve (AUC) of 0.986 [3], and patients with COVID-19 pneumonia can be accurately identified from patients with other common types of pneumonia and normal controls using computed tomography images with an AUC of 0.971 [4].

Despite the performance of image-based deep learning algorithms in detecting diseases is ideal in laboratory settings, their real-world performance is uncertain because most studies develop and evaluate deep learning models using only eligible images [[3], [4], [5], [6], [7]]. In real-world settings, multiple factors can cause the generation of ineligible images including poor-quality and poor-location images. In ophthalmology, poor-quality fundus images can be resulted from operator errors, patient noncompliance, hardware imperfections, and obscured optical media [8,9]. Poor-location fundus images can be caused by patients’ poor target fixation during fundus imaging, such as the image without showing the optic disk in a glaucoma screening programme [3]. Ineligible fundus images will potentially cause diagnostic information loss and subsequently have a negative impact on downstream analysis [10,11]. To improve diagnostic accuracy, an approach that can detect and filter out ineligible images is essential to ensure that the subsequent analyses are based on eligible images. Manual ineligible image screening often requires experts, and this procedure is time-consuming and labor-intensive, especially in large-scale applications. In addition, it is not appropriate that automated AI diagnostic systems applied in the real world need human experts to manually check the eligibility of images first. Therefore, an automated image eligibility verification method appears crucial.

Several studies have developed deep learning models to identify poor-quality fundus images corresponding to specific diseases [[12], [13], [14]]. However, the generalizability of these models is limited because the image quality standards are disease-specific. For example, for glaucoma screening, a poor-quality image is defined when vessels within the optic disk region cannot be discerned [3], while for identifying age-related macular degeneration (AMD), a poor-quality image is defined if vessels within the macular region cannot be identified or over 50 % of the macular region is obscured [12]. Briefly, the image quality assessment model built for one specific disease may not be adapted to another. In addition, it is worth noting that all previous studies mainly focused on an automated approach to detect poor-quality images. Yet an automated approach to identify poor-location images which also negatively affect the subsequent image analyses is not well investigated. Therefore, the development of a versatile automated image eligibility verification method that can detect and filter out both poor-quality and poor-location fundus images is more practical for the demands in real-world settings, as AI-based fundus imaging will ultimately be employed to screen for multiple fundus diseases simultaneously [15].

The purpose of this study is to develop a deep learning-based image eligibility verification system (DLIEVS) to detect and filter out ineligible fundus images including both poor-quality and poor-location images to enable that the subsequent image analyses can be based on eligible images. In addition, we evaluated the effectiveness and generalizability of this system using images from different types of fundus cameras in 4 different institutions.

The rest of the paper is organized as follows: Section 2 introduces the databases and methods proposed in this study; Sections 3 and 4 present the performance analyses and clinical value of the proposed methods, respectively; finally, Section 5 provides conclusions and future directions.

Section snippets

Fundus image datasets

A total of 18,031 fundus images from 4 hospitals were used to develop and evaluate the DLIEVS. These data were consecutively obtained between January 2018 and January 2020 using 4 different types of non-mydriatic fundus cameras and included subjects who underwent routine ophthalmic health evaluations, ophthalmology consultations, and fundus lesion examinations. The subjects were examined without mydriasis. All images were deidentified before they were transferred to research investigators. This

Characteristics of the datasets

In total, 18,031 images (1,899 poor-quality images, 1,506 poor-location images, and 14,626 eligible images) from 9,188 subjects were used to develop and evaluate the DLIEVS. Detailed information of the datasets from NEH, NOC, PCH, and SCH is summarized in Table 2.

Classification performance of deep learning algorithms in the internal test dataset

Four deep learning algorithms (AlexNet, DenseNet121, Inception V3, and ResNet50) were used to train models for image eligibility verification. Their performance in the internal test dataset is shown in Fig. 3, which illustrates that

Discussion

This study was to establish a DLIEVS to detect ineligible fundus images (poor-quality and poor-location images) and evaluate the DLIEVS in four institutions using different commercially available fundus cameras. Our primary finding was that the DLIEVS could discriminate among eligible images, poor-location images, and poor-quality images with high accuracy (Table 3). In the external test datasets, the AUCs of the DLIEVS based on the best algorithm DenseNet121 for detecting eligible images,

Conclusions and future directions

Our DLIEVS had high sensitivity and specificity for discriminating poor-quality and poor-location images from eligible images in both internal and external test datasets. The algorithm DenseNet121 performed better than the other three algorithms (AlexNet, Inception V3, and ResNet50) in detecting ineligible images (all AUCs ≥ 0.997). We believe that this system can function as a pre-screening technique for AI diagnostic systems in real-world clinical settings to ensure only eligible images will

Author contributions

Conception and design: Z.L., J.J., H.Z., and W.C. Funding obtainment: W.C. Provision of study data: W.C., H.Z., and H.W. Collection and assembly of data: Z.L., Q.Z., K.C., X.L., H.W., and H.Z. Data analysis and interpretation: Z.L., J.J., H.Z., and W.C. Manuscript writing: all authors. Final approval of the manuscript: all authors.

Author statement

None.

Summary table

What was already known on the topic

•
Most high-quality medical AI studies excluded ineligible clinical images from real-world data when developing and evaluating their AI diagnostic systems.
•
Ineligible images that had a high potential to compromise downstream analysis were inevitable in clinical practice due to various factors, such as dirty camera lenses, head/body movement, operator errors, and patient noncompliance.

What this study added to our knowledge

•
We developed a robust

Declaration of Competing Interest

The authors declare that they have no competing interests that could inappropriately influence (bias) their work.

Acknowledgments

This study received funding from the National Key R&D Programme of China (grant no. 2019YFC0840708), and the National Natural Science Foundation of China (grant no. 81970770). The funding organizations played no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

References (31)

Z. Li et al.
Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs
Ophthalmology
(2018)
R.E. Hacisoftaoglu et al.
Deep learning frameworks for diabetic retinopathy detection with smartphone-based retinal imaging systems
Pattern Recognit. Lett.
(2020)
F. Grassmann et al.
A deep learning algorithm for prediction of Age-Related eye disease study severity scale for Age-Related macular degeneration from color fundus photography
Ophthalmology
(2018)
M.E. Matheny et al.
Artificial intelligence in health care: a report from the national academy of medicine
JAMA
(2019)
A. Hosny et al.
Artificial intelligence for global health
Science
(2019)
K. Zhang et al.
Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography
Cell
(2020)
J. De Fauw et al.
Clinically applicable deep learning for diagnosis and referral in retinal disease
Nat. Med.
(2018)
V. Gulshan et al.
Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
JAMA
(2016)
S.S. Rahim et al.
Detection of Diabetic Retinopathy and Maculopathy in Eye Fundus Images Using Deep Learning and Image Augmentation
(2019)
E. Trucco et al.
Validating retinal fundus image analysis algorithms: issues and a proposal
Invest. Ophthalmol. Vis. Sci.
(2013)

T.J. Bennett et al.

Ophthalmic imaging today: an ophthalmic photographer’s viewpoint - a review

Clin. Exp. Ophthalmol.

(2009)

E. Ataer-Cansizoglu et al.

Computer-Based image analysis for plus disease diagnosis in retinopathy of prematurity: performance of the “i-ROP” system and image features associated with expert diagnosis

Transl. Vis. Sci. Technol.

(2015)

F. Shao et al.

Automated quality assessment of fundus images via analysis of illumination, naturalness and structure

IEEE Access

(2018)

S. Keel et al.

Development and validation of a deep-learning algorithm for the detection of neovascular age-related macular degeneration from colour fundus photographs

Clin. Exp. Ophthalmol.

(2019)

D. Ting et al.

Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes

JAMA

(2017)

Cited by (9)

Artificial intelligence in ophthalmology: The path to the real-world clinic
2023, Cell Reports Medicine
Artificial intelligence (AI) has great potential to transform healthcare by enhancing the workflow and productivity of clinicians, enabling existing staff to serve more patients, improving patient outcomes, and reducing health disparities. In the field of ophthalmology, AI systems have shown performance comparable with or even better than experienced ophthalmologists in tasks such as diabetic retinopathy detection and grading. However, despite these quite good results, very few AI systems have been deployed in real-world clinical settings, challenging the true value of these systems. This review provides an overview of the current main AI applications in ophthalmology, describes the challenges that need to be overcome prior to clinical implementation of the AI systems, and discusses the strategies that may pave the way to the clinical translation of these systems.
DeepFundus: A flow-cytometry-like image quality classifier for boosting the whole life cycle of medical artificial intelligence
2023, Cell Reports Medicine
Medical artificial intelligence (AI) has been moving from the research phase to clinical implementation. However, most AI-based models are mainly built using high-quality images preprocessed in the laboratory, which is not representative of real-world settings. This dataset bias proves a major driver of AI system dysfunction. Inspired by the design of flow cytometry, DeepFundus, a deep-learning-based fundus image classifier, is developed to provide automated and multidimensional image sorting to address this data quality gap. DeepFundus achieves areas under the receiver operating characteristic curves (AUCs) over 0.9 in image classification concerning overall quality, clinical quality factors, and structural quality analysis on both the internal test and national validation datasets. Additionally, DeepFundus can be integrated into both model development and clinical application of AI diagnostics to significantly enhance model performance for detecting multiple retinopathies. DeepFundus can be used to construct a data-driven paradigm for improving the entire life cycle of medical AI practice.
Comparison of deep learning systems and cornea specialists in detecting corneal diseases from low-quality images
2021, iScience
Citation Excerpt :
Recently deep learning has attained remarkable performance in disease screening and diagnosis (Cheung et al., 2021; Hosny and Aerts, 2019; Li et al., 2020a, 2020b, 2020c, 2020d; Matheny et al., 2019; Zhou et al., 2021). The performance of deep learning is comparable with and even superior to that of human doctors in many clinical image analyses (Li et al., 2021a, 2021b, 2021c, 2021d; Li et al., 2020a, 2020b, 2020c, 2020d; Li et al., 2019; Ting et al., 2017; Xie et al., 2020; Zhang et al., 2020). For example, the accuracy of a deep learning system in distinguishing coronavirus pneumonia from computed tomography images reached the level of senior radiologists (87.5% versus 84.5%; p ＞ 0.05) and exceeded the level of junior radiologists (87.5% versus 65.6%; p < .05) (Zhang et al., 2020).
The performance of deep learning in disease detection from high-quality clinical images is identical to and even greater than that of human doctors. However, in low-quality images, deep learning performs poorly. Whether human doctors also have poor performance in low-quality images is unknown. Here, we compared the performance of deep learning systems with that of cornea specialists in detecting corneal diseases from low-quality slit lamp images. The results showed that the cornea specialists performed better than our previously established deep learning system (PEDLS) trained on only high-quality images. The performance of the system trained on both high- and low-quality images was superior to that of the PEDLS while inferior to that of a senior corneal specialist. This study highlights that cornea specialists perform better in low-quality images than the system trained on high-quality images. Adding low-quality images with sufficient diagnostic certainty to the training set can reduce this performance gap.
MSHF: A Multi-Source Heterogeneous Fundus (MSHF) Dataset for Image Quality Assessment
2023, Scientific Data
Automatic diagnosis of multiple fundus lesions based on depth graph neural network
2023, Optoelectronics Letters
Global trends and performances in diabetic retinopathy studies: A bibliometric analysis
2023, Frontiers in Public Health

View all citing articles on Scopus

¹: These authors contributed equally as first authors.

View full text

Development of a deep learning-based image eligibility verification system for detecting and filtering out ineligible fundus images: A multicentre study

Highlights

Abstract

Background

Methods

Results

Conclusions

Introduction

Section snippets

Fundus image datasets

Characteristics of the datasets

Classification performance of deep learning algorithms in the internal test dataset

Discussion

Conclusions and future directions

Author contributions

Author statement

Declaration of Competing Interest

Acknowledgments

Ophthalmology

Pattern Recognit. Lett.

Ophthalmology

Artificial intelligence in health care: a report from the national academy of medicine

JAMA

Artificial intelligence for global health

Science

Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography

Cell

Clinically applicable deep learning for diagnosis and referral in retinal disease

Nat. Med.

Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

JAMA

Detection of Diabetic Retinopathy and Maculopathy in Eye Fundus Images Using Deep Learning and Image Augmentation

Validating retinal fundus image analysis algorithms: issues and a proposal

Invest. Ophthalmol. Vis. Sci.

Ophthalmic imaging today: an ophthalmic photographer’s viewpoint - a review

Clin. Exp. Ophthalmol.

Computer-Based image analysis for plus disease diagnosis in retinopathy of prematurity: performance of the “i-ROP” system and image features associated with expert diagnosis

Transl. Vis. Sci. Technol.

Automated quality assessment of fundus images via analysis of illumination, naturalness and structure

IEEE Access

Development and validation of a deep-learning algorithm for the detection of neovascular age-related macular degeneration from colour fundus photographs

Clin. Exp. Ophthalmol.

Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes

JAMA