A Unified Efficient Pyramid Transformer for Semantic Segmentation,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Unified Efficient Pyramid Transformer for Semantic Segmentation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-29 , DOI: arxiv-2107.14209
Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo Wu, Yanwei Fu, Mu Li

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries. Most literature either focuses on context modeling or boundary refinement, which is less generalizable in open-world scenarios. In this work, we advocate a unified framework(UN-EPT) to segment objects by considering both context information and boundary artifacts. We first adapt a sparse sampling strategy to incorporate the transformer-based attention mechanism for efficient context modeling. In addition, a separate spatial branch is introduced to capture image details for boundary refinement. The whole model can be trained in an end-to-end manner. We demonstrate promising performance on three popular benchmarks for semantic segmentation with low memory footprint. Code will be released soon.

中文翻译：

用于语义分割的统一高效金字塔变换器

由于在复杂场景中建模上下文的困难和边界上的类混淆，语义分割是一个具有挑战性的问题。大多数文献要么关注上下文建模，要么关注边界细化，这在开放世界场景中不太通用。在这项工作中，我们提倡一个统一框架（UN-EPT）通过考虑上下文信息和边界工件来分割对象。我们首先采用稀疏采样策略来整合基于变换器的注意力机制，以实现高效的上下文建模。此外，引入了一个单独的空间分支来捕获图像细节以进行边界细化。整个模型可以以端到端的方式进行训练。我们在三个流行的低内存占用语义分割基准上展示了有希望的性能。代码将很快发布。

更新日期：2021-07-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文