FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2-7-2018 , DOI: 10.1109/tpami.2018.2803169
Seungryong Kim , Dongbo Min , Bumsub Ham , Stephen Lin , Kwanghoon Sohn

We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. Unlike traditional dense correspondence approaches for estimating depth or optical flow, semantic correspondence estimation poses additional challenges due to intra-class appearance and shape variations among different instances within the same object or scene category. To robustly match points across semantically similar images, we formulate FCSS using local self-similarity (LSS), which is inherently insensitive to intra-class appearance variations. LSS is incorporated through a proposed convolutional self-similarity (CSS) layer, where the sampling patterns and the self-similarity measure are jointly learned in an end-to-end and multi-scale manner. Furthermore, to address shape variations among different object instances, we propose a convolutional affine transformer (CAT) layer that estimates explicit affine transformation fields at each pixel to transform the sampling patterns and corresponding receptive fields. As training data for semantic correspondence is rather limited, we propose to leverage object candidate priors provided in most existing datasets and also correspondence consistency between object pairs to enable weakly-supervised learning. Experiments demonstrate that FCSS significantly outperforms conventional handcrafted descriptors and CNN-based descriptors on various benchmarks.

中文翻译：

FCSS：密集语义对应的全卷积自相似性

我们提出了一种描述符，称为全卷积自相似性（FCSS），用于密集语义对应。与用于估计深度或光流的传统密集对应方法不同，由于同一对象或场景类别内不同实例之间的类内外观和形状变化，语义对应估计带来了额外的挑战。为了在语义相似的图像上稳健地匹配点，我们使用局部自相似性（LSS）来制定 FCSS，它本质上对类内外观变化不敏感。 LSS 通过提出的卷积自相似性（CSS）层合并，其中采样模式和自相似性度量以端到端和多尺度的方式联合学习。此外，为了解决不同对象实例之间的形状变化，我们提出了一个卷积仿射变换器（CAT）层，该层估计每个像素处的显式仿射变换场，以变换采样模式和相应的感受野。由于语义对应的训练数据相当有限，我们建议利用大多数现有数据集中提供的对象候选先验以及对象对之间的对应一致性来实现弱监督学习。实验表明，FCSS 在各种基准测试中显着优于传统的手工描述符和基于 CNN 的描述符。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11