Towards Effective Deep Embedding for Zero-Shot Learning,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Effective Deep Embedding for Zero-Shot Learning
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.3 ) Pub Date : 2020-09-01 , DOI: 10.1109/tcsvt.2020.2984666
Lei Zhang , Peng Wang , Lingqiao Liu , Chunhua Shen , Wei Wei , Yanning Zhang , Anton van den Hengel

Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.

中文翻译：

实现零镜头学习的有效深度嵌入

零样本学习 (ZSL) 可以表述为跨域匹配问题：在投影到联合嵌入空间后，视觉样本将与所有候选类级语义描述匹配并分配到最近的类。在这个过程中，嵌入空间是这种匹配成功的基础，对 ZSL 至关重要。在本文中，我们对 ZSL 嵌入空间的构建进行了深入研究，并假设理想的嵌入空间应满足两个标准：类内紧凑性和类间可分离性。前者鼓励一类的视觉样本的嵌入紧密地分布在该类的语义描述嵌入附近，而后者则要求来自不同类的嵌入彼此很好地分离。为了这个目标，我们提出了一个简单但有效的双分支网络，将语义描述和视觉样本同时映射到一个联合空间，在该空间上，视觉嵌入被迫回归到它们的类级语义嵌入，并且跨类的嵌入需要通过可训练的分类器。此外，我们将我们的方法扩展到转导设置，以在最少的额外监督下更好地处理 ZSL 中的模型偏差问题（即，来自看不见的类的样本往往被分类为可见的类）。具体来说，我们提出了一种伪标记策略，以逐步将测试样本纳入训练过程，从而平衡可见类和不可见类之间的模型。

更新日期：2020-09-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11