Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy,npj Digital Medicine

当前位置： X-MOL 学术 › npj Digit. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy
npj Digital Medicine ( IF 15.2 ) Pub Date : 2020-03-23 , DOI: 10.1038/s41746-020-0247-1
Michelle Y T Yip _{1,

2} , Gilbert Lim _{1,

3} , Zhan Wei Lim ₃ , Quang D Nguyen ₁ , Crystal C Y Chong ₁ , Marco Yu ₁ , Valentina Bellemo ₁ , Yuchen Xie ₁ , Xin Qi Lee ₁ , Haslina Hamzah ₁ , Jinyi Ho ₁ , Tien-En Tan ₁ , Charumathi Sabanayagam _{1,

2} , Andrzej Grzybowski _{4,

5} , Gavin S W Tan _{1,

2} , Wynne Hsu ₃ , Mong Li Lee ₃ , Tien Yin Wong _{1,

2} , Daniel S W Ting _{1,

2,

6}

Affiliation

Deep learning (DL) has been shown to be effective in developing diabetic retinopathy (DR) algorithms, possibly tackling financial and manpower challenges hindering implementation of DR screening. However, our systematic review of the literature reveals few studies studied the impact of different factors on these DL algorithms, that are important for clinical deployment in real-world settings. Using 455,491 retinal images, we evaluated two technical and three image-related factors in detection of referable DR. For technical factors, the performances of four DL models (VGGNet, ResNet, DenseNet, Ensemble) and two computational frameworks (Caffe, TensorFlow) were evaluated while for image-related factors, we evaluated image compression levels (reducing image size, 350, 300, 250, 200, 150 KB), number of fields (7-field, 2-field, 1-field) and media clarity (pseudophakic vs phakic). In detection of referable DR, four DL models showed comparable diagnostic performance (AUC 0.936-0.944). To develop the VGGNet model, two computational frameworks had similar AUC (0.936). The DL performance dropped when image size decreased below 250 KB (AUC 0.936, 0.900, p < 0.001). The DL performance performed better when there were increased number of fields (dataset 1: 2-field vs 1-field—AUC 0.936 vs 0.908, p < 0.001; dataset 2: 7-field vs 2-field vs 1-field, AUC 0.949 vs 0.911 vs 0.895). DL performed better in the pseudophakic than phakic eyes (AUC 0.918 vs 0.833, p < 0.001). Various image-related factors play more significant roles than technical factors in determining the diagnostic performance, suggesting the importance of having robust training and testing datasets for DL training and deployment in the real-world settings.

中文翻译：

影响糖尿病视网膜病变深度学习系统性能的技术和成像因素

深度学习 (DL) 已被证明可以有效开发糖尿病视网膜病变 (DR) 算法，可能解决阻碍 DR 筛查实施的财务和人力挑战。然而，我们对文献的系统回顾表明，很少有研究研究不同因素对这些深度学习算法的影响，而这对于现实环境中的临床部署很重要。使用 455,491 幅视网膜图像，我们评估了可参考 DR 检测中的两个技术因素和三个图像相关因素。对于技术因素，我们评估了四种深度学习模型（VGGNet、ResNet、DenseNet、Ensemble）和两种计算框架（Caffe、TensorFlow）的性能，而对于图像相关因素，我们评估了图像压缩级别（减少图像大小、350、300）、250、200、150 KB）、视场数（7 视场、2 视场、1 视场）和介质清晰度（人工晶状体眼与有晶状体眼）。在检测可参考 DR 时，四种 DL 模型显示出相当的诊断性能（AUC 0.936-0.944）。为了开发 VGGNet 模型，两个计算框架具有相似的 AUC (0.936)。当图像大小减小到 250 KB 以下时，深度学习性能下降（AUC 0.936、0.900、p < 0.001）。当视场数量增加时，DL 性能表现更好（数据集 1：2 视场与 1 视场 — AUC 0.936 与 0.908，p < 0.001；数据集 2：7 视场与 2 视场与 1 视场，AUC 0.949对比 0.911 对比 0.895）。DL 在人工晶状体眼中的表现优于有晶状体眼（AUC 0.918 vs 0.833，p < 0.001）。在确定诊断性能方面，各种与图像相关的因素比技术因素发挥着更重要的作用，这表明在现实环境中为深度学习训练和部署提供强大的训练和测试数据集的重要性。

更新日期：2020-03-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>