Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.03048
Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Erdi, Christopher Kanan

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

中文翻译：

使用理智测试，检测人工智能制导放射系统的虚假相关性

人工智能（AI）已成功解决了机器感知方面的众多问题。在放射学中，人工智能系统正在迅速发展，并在指导治疗决策，诊断，在医学图像上定位疾病以及提高放射科医生的效率方面显示出进步。在放射学中部署AI的关键要素是对已开发系统的功效和安全性充满信心。当前的金标准方法是对来自一个或多个机构的概括数据集进行性能分析验证，然后对系统在部署期间的功效进行临床验证研究。临床验证研究非常耗时，并且最佳实践要求有限地重复使用分析验证数据，因此，理想的是提前知道系统是否可能无法通过分析或临床验证。在本文中，我们描述了一系列的健全性测试，以识别由于错误的原因何时系统在开发数据上表现良好。我们通过设计深度学习系统对计算机断层扫描中所见的胰腺癌进行分类，来说明理智测试的价值。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文