CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.03024
Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a {\bf Co}nvolutional neural network and a {\bf Tr}ansformer {\bf (CoTr)} for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer which treats all image positions equally, our DeTrans pays attention only to a small set of key positions by introducing the deformable self-attention mechanism. Thus, the computational and spatial complexities of DeTrans have been greatly reduced, making it possible to process the multi-scale and high-resolution feature maps, which are usually of paramount importance for image segmentation. We conduct an extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that covers 11 major human organs. The results indicate that our CoTr leads to a substantial performance improvement over other CNN-based, transformer-based, and hybrid methods on the 3D multi-organ segmentation task. Code is available at \def\UrlFont{\rm\small\ttfamily} \url{https://github.com/YtongXie/CoTr}

中文翻译：

CoTr：有效桥接CNN和变压器以进行3D医学图像分割

卷积神经网络（CNN）已经成为当今3D医学图像分割的事实上的标准。然而，这些网络中使用的卷积运算不可避免地在建模远程依赖项方面存在局限性，这是由于它们的局部性和权重归纳性偏差。尽管Transformer就是为解决这个问题而诞生的，但是在处理高分辨率3D特征图时，它遭受了极大的计算和空间复杂性的困扰。在本文中，我们提出了一种新颖的框架，该框架可以有效地桥接{\ bf Co}进化神经网络和{\ bf Tr}变形器{\ bf（CoTr）}，以进行精确的3D医学图像分割。在此框架下，构建CNN来提取特征表示，并构建有效的可变形变压器（DeTrans）来建模对提取的特征图的远程依赖性。与香草变形金刚（均等地对待所有图像位置）不同，我们的DeTrans通过引入可变形的自我注意机制，仅关注少数关键位置。因此，大大降低了DeTrans的计算和空间复杂度，从而可以处理通常对于图像分割至关重要的多尺度和高分辨率特征图。我们对涵盖颅底金库（BCV）数据集以外的Multi-Atlas标签进行了广泛的评估，该数据集涵盖11个主要人体器官。结果表明，在3D多器官分割任务上，我们的CoTr与其他基于CNN，基于变压器和混合的方法相比，可显着提高性能。代码位于\ def \ UrlFont {\ rm \ small \ ttfamily} \ url {https://github.com/YtongXie/CoTr}

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>