当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic Disentangling Generalized Zero-ShotLearning
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-20 , DOI: arxiv-2101.07978
Zhi Chen, Ruihong Qiu, Sen Wang, Zi Huang, Jingjing Li, Zheng Zhang

Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories. Most GZSL methods typically learn to synthesize CNN visual features for the unseen classes by leveraging entire semantic information, e.g., tags and attributes, and the visual features of the seen classes. Within the visual features, we define two types of features that semantic-consistent and semantic-unrelated to represent the characteristics of images annotated in attributes and less informative features of images respectively. Ideally, the semantic-unrelated information is impossible to transfer by semantic-visual relationship from seen classes to unseen classes, as the corresponding characteristics are not annotated in the semantic information. Thus, the foundation of the visual feature synthesis is not always solid as the features of the seen classes may involve semantic-unrelated information that could interfere with the alignment between semantic and visual modalities. To address this issue, in this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture to factorize visual features of images into these two latent feature spaces to extract corresponding representations. Furthermore, a relation module is incorporated into this architecture to learn semantic-visual relationship, whilst a total correlation penalty is applied to encourage the disentanglement of two latent representations. The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images, which are further taken as the generation target for unseen classes. Extensive experiments conducted on seven GZSL benchmark datasets have verified the state-of-the-art performance of the proposal.

中文翻译:

语义解缠结的广义零射击学习

广义零射击学习(GZSL)旨在识别可见和不可见类别的图像。大多数GZSL方法通常会通过利用整个语义信息(例如标签和属性)以及所看到类的视觉特征,来为看不见的类学习合成CNN视觉特征。在视觉特征内,我们定义了语义一致和语义无关的两种类型的特征,分别表示在属性中标注的图像的特征和信息较少的特征。理想情况下,与语义无关的信息不可能通过语义-视觉关系从已看到的类转移到未看到的类,因为相应的特征未在语义信息中标注。从而,视觉特征合成的基础并不总是牢固的,因为所看到的类的特征可能涉及与语义无关的信息,这些信息可能会干扰语义和视觉模态之间的对齐。为了解决这个问题,在本文中,我们提出了一种基于编码器-解码器体系结构的新颖特征分解方法,将图像的视觉特征分解为这两个潜在特征空间,以提取相应的表示。此外,将关系模块合并到此体系结构中以学习语义-视觉关系,同时应用总相关性惩罚以鼓励解开两个潜在表示的纠缠。提出的模型旨在提取可捕捉可见图像内在特征的高质量语义一致表示,进一步将其作为看不见的课程的生成目标。在七个GZSL基准数据集上进行的广泛实验已经验证了该提案的最新性能。
更新日期:2021-01-21
down
wechat
bug