当前位置: X-MOL 学术npj Digit. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations
npj Digital Medicine ( IF 15.2 ) Pub Date : 2022-07-22 , DOI: 10.1038/s41746-022-00635-4
Niccolò Marini 1, 2 , Stefano Marchesin 3 , Sebastian Otálora 1, 2 , Marek Wodzinski 1, 4 , Alessandro Caputo 5, 6 , Mart van Rijthoven 7 , Witali Aswolinskiy 7 , John-Melle Bokhorst 7 , Damian Podareanu 8 , Edyta Petters 9 , Svetla Boytcheva 10, 11 , Genziana Buttafuoco 6 , Simona Vatrano 6 , Filippo Fraggetta 6, 12 , Jeroen van der Laak 7, 13 , Maristella Agosti 3 , Francesco Ciompi 7 , Gianmaria Silvello 3 , Henning Muller 1, 14 , Manfredo Atzori 1, 15
Affiliation  

The digitalization of clinical workflows and the increasing performance of deep learning algorithms are paving the way towards new methods for tackling cancer diagnosis. However, the availability of medical specialists to annotate digitized images and free-text diagnostic reports does not scale with the need for large datasets required to train robust computer-aided diagnosis methods that can target the high variability of clinical cases and data produced. This work proposes and evaluates an approach to eliminate the need for manual annotations to train computer-aided diagnosis tools in digital pathology. The approach includes two components, to automatically extract semantically meaningful concepts from diagnostic reports and use them as weak labels to train convolutional neural networks (CNNs) for histopathology diagnosis. The approach is trained (through 10-fold cross-validation) on 3’769 clinical images and reports, provided by two hospitals and tested on over 11’000 images from private and publicly available datasets. The CNN, trained with automatically generated labels, is compared with the same architecture trained with manual labels. Results show that combining text analysis and end-to-end deep neural networks allows building computer-aided diagnosis tools that reach solid performance (micro-accuracy = 0.908 at image-level) based only on existing clinical data without the need for manual annotations.



中文翻译:

通过在没有人工注释的情况下训练计算机辅助诊断模型来释放数字病理数据的潜力

临床工作流程的数字化和深度学习算法的性能提升正在为解决癌症诊断的新方法铺平道路。然而,医学专家注释数字化图像和自由文本诊断报告的可用性并不能满足对训练强大的计算机辅助诊断方法所需的大型数据集的需求,这些方法可以针对临床病例和产生的数据的高度可变性。这项工作提出并评估了一种方法,以消除对手动注释的需要,以训练数字病理学中的计算机辅助诊断工具。该方法包括两个组件,从诊断报告中自动提取语义上有意义的概念,并将它们用作弱标签来训练卷积神经网络 (CNN) 以进行组织病理学诊断。该方法在由两家医院提供的 3,769 张临床图像和报告上进行训练(通过 10 倍交叉验证),并在来自私人和公开可用数据集的 11,000 多张图像上进行了测试。将使用自动生成的标签训练的 CNN 与使用手动标签训练的相同架构进行比较。结果表明,结合文本分析和端到端深度神经网络,可以构建仅基于现有临床数据而无需手动注释即可达到可靠性能(图像级微精度 = 0.908)的计算机辅助诊断工具。与使用手动标签训练的相同架构进行比较。结果表明,结合文本分析和端到端深度神经网络,可以构建仅基于现有临床数据而无需手动注释即可达到可靠性能(图像级微精度 = 0.908)的计算机辅助诊断工具。与使用手动标签训练的相同架构进行比较。结果表明,结合文本分析和端到端深度神经网络,可以构建仅基于现有临床数据而无需手动注释即可达到可靠性能(图像级微精度 = 0.908)的计算机辅助诊断工具。

更新日期:2022-07-22
down
wechat
bug