当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Segmenting Transparent Object in the Wild with Transformer
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-21 , DOI: arxiv-2101.08461
Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset. Unlike Trans10K-v1 that only has two limited categories, our new dataset has several appealing benefits. (1) It has 11 fine-grained categories of transparent objects, commonly occurring in the human domestic environment, making it more practical for real-world application. (2) Trans10K-v2 brings more challenges for the current advanced segmentation methods than its former version. Furthermore, a novel transformer-based segmentation pipeline termed Trans2Seg is proposed. Firstly, the transformer encoder of Trans2Seg provides the global receptive field in contrast to CNN's local receptive field, which shows excellent advantages over pure CNN architectures. Secondly, by formulating semantic segmentation as a problem of dictionary look-up, we design a set of learnable prototypes as the query of Trans2Seg's transformer decoder, where each prototype learns the statistics of one category in the whole dataset. We benchmark more than 20 recent semantic segmentation methods, demonstrating that Trans2Seg significantly outperforms all the CNN-based methods, showing the proposed algorithm's potential ability to solve transparent object segmentation.

中文翻译:

使用Transformer在野外分割透明对象

这项工作提出了一个新的细粒度透明对象分割数据集,称为Trans10K-v2,它扩展了第一个大规模透明对象分割数据集Trans10K-v1。与Trans10K-v1仅具有两个有限的类别不同,我们的新数据集具有许多吸引人的好处。(1)它具有11种细粒度类别的透明对象,通常在人类家庭环境中出现,使其在实际应用中更加实用。(2)Trans10K-v2给当前的高级分割方法带来了比以前的版本更多的挑战。此外,提出了一种新型的基于变压器的分段管道,称为Trans2Seg。首先,与CNN的局部接收场相比,Trans2Seg的变压器编码器提供了全局接收场,与纯CNN架构相比,它具有出色的优势。其次,通过将语义分段公式化为字典查找问题,我们设计了一组可学习的原型作为Trans2Seg的Translator解码器的查询,其中每个原型都学习整个数据集中一个类别的统计信息。我们对20多种最新的语义分割方法进行了基准测试,证明Trans2Seg的性能明显优于所有基于CNN的方法,从而表明了该算法解决透明对象分割的潜在能力。
更新日期:2021-01-22
down
wechat
bug