Classification of cancer pathology reports: a large-scale comparative study,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Classification of cancer pathology reports: a large-scale comparative study
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.16370
Stefano Martina, Leonardo Ventura, Paolo Frasconi

We report about the application of state-of-the-art deep learning techniques to the automatic and interpretable assignment of ICD-O3 topography and morphology codes to free-text cancer reports. We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes (134 morphological classes and 61 topographical classes). We compare alternative architectures in terms of prediction accuracy and interpretability and show that our best model achieves a multiclass accuracy of 90.3% on topography site assignment and 84.8% on morphology type assignment. We found that in this context hierarchical models are not better than flat models and that an element-wise maximum aggregator is slightly better than attentive models on site classification. Moreover, the maximum aggregator offers a way to interpret the classification process.

中文翻译：

癌症病理报告的分类：大规模比较研究

我们报告了最先进的深度学习技术在将 ICD-O3 地形和形态代码自动和可解释分配到自由文本癌症报告中的应用。我们在一个大型数据集（超过 80 000 个标记和 1 500 000 个未标记的匿名报告用意大利语编写并从托斯卡纳的医院收集了十多年）和大量类别（134 个形态类别和 61 个地形类别）上提供结果）。我们在预测准确性和可解释性方面比较了替代架构，并表明我们的最佳模型在地形站点分配上实现了 90.3% 的多类准确性，在形态类型分配上实现了 84.8%。我们发现，在这种情况下，层次模型并不比平面模型好，而且元素最大聚合器在站点分类方面略好于细心模型。此外，最大聚合器提供了一种解释分类过程的方法。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文