Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2021-04-19 , DOI: 10.1007/s11263-021-01451-1
Ziyu Wan , Dongdong Chen , Jing Liao

To recognize objects of the unseen classes, most existing Zero-Shot Learning(ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, for data in the wild, distributions between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (i.e.alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose two new training strategies to handle the data in the wild, where many unrelated images in the test dataset may exist. This realistic setting has never been considered in previous methods. Extensive experiments demonstrate that the proposed visual structure constraint brings substantial performance gain consistently and the new training strategies make it generalize well for data in the wild. The source code is available at https://github.com/raywzy/VSC.

中文翻译：

野外转导零射击学习的视觉结构约束

为了识别不可见类的对象，大多数现有的零散学习（ZSL）方法首先基于源可见类的数据学习公共语义空间和视觉空间之间的兼容投影函数，然后将其直接应用于目标不可见类类。但是，对于野外数据，源域和目标域之间的分布可能无法很好地匹配，从而导致众所周知的域移位问题。基于观察到的测试实例的视觉特征可以分为不同的聚类的观点，我们针对转导ZSL的类中心提出了一种新的视觉结构约束，以提高投影函数的通用性（即缓解上述域转移问题）。具体而言，采用了三种不同的策略（对称倒角距离，二分匹配距离和Wasserstein距离）来对齐测试实例的预计看不见的语义中心和视觉集群中心。我们还提出了两种新的训练策略来处理野外数据，其中测试数据集中可能存在许多不相关的图像。在以前的方法中从未考虑过这种现实的设置。大量的实验表明，所提出的视觉结构约束始终如一地带来了可观的性能提升，而新的训练策略使它对于野外数据具有很好的通用性。源代码可从https://github.com/raywzy/VSC获得。

更新日期：2021-04-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11