当前位置: X-MOL 学术Eur. Radiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Test-retest reproducibility of a deep learning-based automatic detection algorithm for the chest radiograph.
European Radiology ( IF 4.7 ) Pub Date : 2020-01-03 , DOI: 10.1007/s00330-019-06589-8
Hyungjin Kim 1, 2 , Chang Min Park 1, 2, 3 , Jin Mo Goo 1, 2, 3
Affiliation  

OBJECTIVES To perform test-retest reproducibility analyses for deep learning-based automatic detection algorithm (DLAD) using two stationary chest radiographs (CRs) with short-term intervals, to analyze influential factors on test-retest variations, and to investigate the robustness of DLAD to simulated post-processing and positional changes. METHODS This retrospective study included patients with pulmonary nodules resected in 2017. Preoperative CRs without interval changes were used. Test-retest reproducibility was analyzed in terms of median differences of abnormality scores, intraclass correlation coefficients (ICC), and 95% limits of agreement (LoA). Factors associated with test-retest variation were investigated using univariable and multivariable analyses. Shifts in classification between the two CRs were analyzed using pre-determined cutoffs. Radiograph post-processing (blurring and sharpening) and positional changes (translations in x- and y-axes, rotation, and shearing) were simulated and agreement of abnormality scores between the original and simulated CRs was investigated. RESULTS Our study analyzed 169 patients (median age, 65 years; 91 men). The median difference of abnormality scores was 1-2% and ICC ranged from 0.83 to 0.90. The 95% LoA was approximately ± 30%. Test-retest variation was negatively associated with solid portion size (β, - 0.50; p = 0.008) and good nodule conspicuity (β, - 0.94; p < 0.001). A small fraction (15/169) showed discordant classifications when the high-specificity cutoff (46%) was applied to the model outputs (p = 0.04). DLAD was robust to the simulated positional change (ICC, 0.984, 0.996), but relatively less robust to post-processing (ICC, 0.872, 0.968). CONCLUSIONS DLAD was robust to the test-retest variation. However, inconspicuous nodules may cause fluctuations of the model output and subsequent misclassifications. KEY POINTS • The deep learning-based automatic detection algorithm was robust to the test-retest variation of the chest radiographs in general. • The test-retest variation was negatively associated with solid portion size and good nodule conspicuity. • High-specificity cutoff (46%) resulted in discordant classifications of 8.9% (15/169; p = 0.04) between the test-retest radiographs.

中文翻译:

基于深度学习的胸片自动检测算法的重测再现性。

目的 对基于深度学习的自动检测算法 (DLAD) 使用具有短期间隔的两张固定胸片 (CR) 进行重测再现性分析,分析重测变异的影响因素,并研究 DLAD 的稳健性模拟后处理和位置变化。方法 这项回顾性研究包括 2017 年切除肺结节的患者。使用没有间隔变化的术前 CR。根据异常分数的中位数差异、组内相关系数 (ICC) 和 95% 的一致限度 (LoA) 来分析重测重现性。使用单变量和多变量分析研究了与重测变异相关的因素。使用预先确定的截止值分析两个 CR 之间的分类变化。模拟射线照相后处理(模糊和锐化)和位置变化(x 轴和 y 轴平移、旋转和剪切),并研究原始 CR 和模拟 CR 之间异常分数的一致性。结果 我们的研究分析了 169 名患者(中位年龄 65 岁;91 名男性)。异常评分的中位数差异为 1-2%,ICC 范围为 0.83 至 0.90。95% LoA 约为± 30%。重测变异与固体部分大小 (β, - 0.50; p = 0.008) 和良好的结节显着性 (β, - 0.94; p < 0.001) 呈负相关。当将高特异性截止值 (46%) 应用于模型输出 (p = 0.04) 时,一小部分 (15/169) 显示出不一致的分类。DLAD 对模拟位置变化 (ICC, 0.984, 0.996) 具有鲁棒性,但对后处理 (ICC, 0.872, 0.968) 的鲁棒性相对较低。结论 DLAD 对重测变异具有稳健性。然而,不显眼的结节可能会导致模型输出的波动和随后的错误分类。要点 • 基于深度学习的自动检测算法对一般胸片的重测变化具有鲁棒性。• 重测变异与固体部分大小和良好的结节显着性呈负相关。• 高特异性截止值(46%) 导致测试-再测试射线照片之间的分类不一致为8.9% (15/169;p = 0.04)。然而,不显眼的结节可能会导致模型输出的波动和随后的错误分类。要点 • 基于深度学习的自动检测算法对一般胸片的重测变化具有鲁棒性。• 重测变异与固体部分大小和良好的结节显着性呈负相关。• 高特异性截止值(46%) 导致测试-再测试射线照片之间的分类不一致为8.9% (15/169;p = 0.04)。然而,不显眼的结节可能会导致模型输出的波动和随后的错误分类。要点 • 基于深度学习的自动检测算法对一般胸片的重测变化具有鲁棒性。• 重测变异与固体部分大小和良好的结节显着性呈负相关。• 高特异性截止值(46%) 导致测试-再测试射线照片之间的分类不一致为8.9% (15/169;p = 0.04)。
更新日期:2020-01-04
down
wechat
bug