当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Zero-shot semantic segmentation via spatial and multi-scale aware visual class embedding
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2022-04-20 , DOI: 10.1016/j.patrec.2022.04.011
Sungguk Cha 1 , Yooseung Wang 2
Affiliation  

Fully supervised semantic segmentation technologies bring a paradigm shift in scene understanding. However, the burden of expensive labeling cost remains as a challenge. To solve the cost problem, recent studies proposed language model based zero-shot semantic segmentation (L-ZSSS) approaches. In this paper, we address L-ZSSS has a limitation in generalization which is a virtue of zero-shot learning. Tackling the limitation, we propose a language-model-free zero-shot semantic segmentation framework, Spatial and Multi-scale aware Visual Class Embedding Network (SM-VCENet). Furthermore, leveraging vision-oriented class embedding SM-VCENet enriches visual information of the class embedding by multi-scale attention and spatial attention. We also propose a novel benchmark (PASCAL2COCO) for zero-shot semantic segmentation, which provides generalization evaluation by domain adaptation and contains visually challenging samples. In experiments, our SM-VCENet outperforms zero-shot semantic segmentation state-of-the-art by a significant margin in both PASCAL-5i and PASCAL2COCO benchmarks.



中文翻译:

通过空间和多尺度感知视觉类嵌入的零样本语义分割

完全监督的语义分割技术带来了场景理解的范式转变。然而,昂贵的标签成本负担仍然是一个挑战。为了解决成本问题,最近的研究提出了基于语言模型的零样本语义分割(L-ZSSS)方法。在本文中,我们解决了 L-ZSSS 在泛化方面的局限性,这是零样本学习的优点。针对这一限制,我们提出了一种无语言模型的零样本语义分割框架、空间和多尺度感知视觉类嵌入网络(SM-VCENet)。此外,利用面向视觉的类嵌入 SM-VCENet 通过多尺度注意力和空间注意力丰富了类嵌入的视觉信息。我们还提出了一个用于零样本语义分割的新基准(PASCAL2COCO),它通过域适应提供泛化评估,并包含视觉上具有挑战性的样本。在实验中,我们的 SM-VCENet 在 PASCAL-5一世和 PASCAL2COCO 基准。

更新日期:2022-04-20
down
wechat
bug