当前位置: X-MOL 学术medRxiv. Radiol. Imaging › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Conformal Supervision: a comparative study
medRxiv - Radiology and Imaging Pub Date : 2024-03-28 , DOI: 10.1101/2024.03.28.24305008
Amir M. Vahdani , Shahriar Faghani

Background: Trustability is crucial for AI models in clinical settings. Conformal prediction as a robust uncertainty quantification framework has been receiving increasing attention as a valuable tool in improving model trustability. An area of active research is the method of non-conformity score calculation for conformal prediction. Method: We propose deep conformal supervision (DCS) which leverages the intermediate outputs of deep supervision for non-conformity score calculation, via weighted averaging based on the inverse of mean calibration error for each stage. We benchmarked our method on two publicly available datasets focused on medical image classification; a pneumonia chest radiography dataset and a preprocessed version of the 2019 RSNA Intracranial Hemorrhage dataset. Results: Our method achieved mean coverage errors of 16e-4 (CI: 1e-4, 41e-4) and 5e-4 (CI: 1e-4, 10e-4) compared to baseline mean coverage errors of 28e-4 (CI: 2e-4, 64e-4) and 21e-4 (CI: 8e-4, 3e-4) on the two datasets, respectively. Conclusion: In this non-inferiority study, we observed that the baseline results of conformal prediction already exhibit small coverage errors. Our method shows a relative enhancement, particularly noticeable in scenarios involving smaller datasets or when considering smaller acceptable error levels, although this improvement is not statistically significant. Keywords: Deep learning, Conformal prediction, Deep supervision, Uncertainty quantification, Classification

中文翻译:

深度共形监督:比较研究

背景:可信度对于临床环境中的人工智能模型至关重要。保形预测作为一种稳健的不确定性量化框架,作为提高模型可信度的宝贵工具,越来越受到人们的关注。一个活跃的研究领域是保形预测的不合格分数计算方法。方法:我们提出深度共形监督(DCS),它利用深度监督的中间输出进行不合格分数计算,通过基于每个阶段平均校准误差的倒数的加权平均。我们在两个专注于医学图像分类的公开数据集上对我们的方法进行了基准测试;肺炎胸片数据集和 2019 RSNA 颅内出血数据集的预处理版本。结果:我们的方法实现了 16e-4(CI:1e-4、41e-4)和 5e-4(CI:1e-4、10e-4)的平均覆盖误差,而基线平均覆盖误差为 28e-4(CI:1e-4、10e-4) :两个数据集上分别为 2e-4、64e-4)和 21e-4(CI:8e-4、3e-4)。结论:在这项非劣效性研究中,我们观察到保形预测的基线结果已经表现出很小的覆盖误差。我们的方法显示出相对增强,在涉及较小数据集的场景或考虑较小的可接受错误水平时尤其明显,尽管这种改进在统计上并不显着。关键词:深度学习、保形预测、深度监督、不确定性量化、分类
更新日期:2024-03-29
down
wechat
bug