Generalized zero-shot classification via iteratively generating and selecting unseen samples

doi:10.1016/j.image.2020.116115

Signal Processing: Image Communication

Volume 92, March 2021, 116115

https://doi.org/10.1016/j.image.2020.116115 Get rights and content

Highlights

•
Iteratively generate unseen samples and select unseen samples.
•
Make the generated unseen samples similar to the real unseen visual prototypes.
•
Select the confident unseen samples by the classifier trained with the generated unseen samples.
•
Experimental results show the superiority of the proposed model.

Abstract

Generalized zero-shot classification (GZSC) is a challenging task to recognize seen and unseen samples from target domain by seen samples in source domain. Since the lack of unseen data, many methods train a generative adversarial network (GAN) to generate unseen samples. However, the GAN model trained by seen samples is not suitable for generating unseen samples. For dealing with this problem, we train the GAN model by generating seen and unseen samples, simultaneously. In order to generate high-quality unseen samples, the visual prototypes of the generated unseen samples are made near to the real unseen visual prototypes. We select the confident unseen samples based on the agreement of the current two unseen classifiers and use them to update the unseen visual prototypes. Through the iteratively generating and selecting method (IGS), we can generate high-quality unseen samples and select the most confident unseen samples. Experimental results on the standard benchmarks show the superiority of the proposed model over the state-of-the-art methods for GZSC tasks.

Introduction

With the rapid development of deep learning technique, image classification develops rapidly because of the huge amount of labeled data. However, the labeling process is expensive and boring. In addition, some samples of new classes are constantly emerging which lack training samples. For these problems, zero-shot learning (ZSL) [1] gains great attention. ZSL deals with the case that test classes are unavailable in training stage. The training classes are seen classes with large amount of labeled data. The test classes are unseen classes without labeled data which are disjoint with seen classes. Fortunately, the relationships of seen and unseen classes are available such as semantic attributes [2], [3], word vectors [4]. The semantic representations of seen (unseen) classes are called seen (unseen) class prototypes. Thus, we can transfer knowledge from seen classes to unseen classes by their shared semantic representations.

Many attempts have been made to deal with ZSL problem by projecting visual representations to the semantic space [3], [4]. The test samples can be projected into the semantic space and compared with the unseen class prototypes. The categories of them can be predicted as the class with the closest unseen class prototype. These methods suffer from the domain shift problem [5] because the projection function is learnt by seen samples which may not be suitable for the unseen samples. Some ZSL methods exploit the latent space for the classification of unseen samples in which visual and semantic representations are projected [5], [6]. The latent space is difficult to find. Some ZSL methods exploit the visual space for the classification of unseen samples in which unseen class prototypes are projected [7]. These methods achieve promising results on ZSL tasks. However, they cannot deal with the case that test set contains seen and unseen samples which is a more realistic situation. Generalized zero-shot classification (GZSC) [1] aims to recognize seen and unseen samples in target domain based on seen samples from source domain.

Recently, inspired by the GAN model [8], many generative ZSL methods [9], [10] have been proposed to generate unseen samples from semantic representations and noises. GZSC problem can be converted to a conventional classification problem with seen data and the generated unseen data. Existing generative ZSL models assume the generator which is able to generate high-quality seen data can generate high-quality unseen data. However, the generator trained on seen classes is not suitable for generating unseen samples. For handling with this bias problem, we train the generator by generating seen and unseen samples simultaneously. For generating high-quality unseen samples, the generated unseen samples are made near to their real unseen visual prototypes. The real unseen visual prototypes are unavailable since the unseen samples are missing. Many GZSL methods directly utilize the unlabeled unseen data [11], [12]. However, it is not reasonable to suppose the unlabeled data belonging to unseen classes is known. In GZSL, target samples are from seen or unseen classes. Instead of directly using the unlabeled unseen data, we consider there are seen and unseen data in the target domain. We select some confident unseen samples from target domain and assign them pseudo labels to update the unseen visual prototypes. Thus, we iteratively generate samples from the current GAN model and select some confident unseen samples based on the classifier trained by the generated unseen samples. Extensive experiments on four datasets for GZSL tasks reveal the superiority of the proposed method over the compared methods. The contributions are summarized as follows:

(1) We have proposed to iteratively generate unseen samples and select unseen samples from target domain. Through the iterative process, we can generate high-quality unseen samples and select the most confident unseen samples.

(2) In order to generate high-quality unseen samples, when training the GAN model, we restrict the fake visual prototypes of the generated unseen samples near to the real unseen visual prototypes. The real unseen visual prototypes are obtained by the selected confident unseen samples.

(3) Since the unseen samples are missing, the directly selection of unseen samples is difficult. We select the confident unseen samples from target domain by the agreement of two unseen classifiers. One classifier is based on the current unseen visual prototypes. Another classifier is trained with the generated unseen samples by the current GAN model.

(4) The final classification model is trained with the real seen samples, the generated unseen samples and the selected confident unseen samples. Compared to previous GZSC methods, our proposed method achieves state-of-the-art results on four widely used datasets.

The rest of this paper is as follows. Section 2 presents the related work in GZSL. Section 3 describes our generalized zero-shot classification via iteratively generating and selecting method. Section 4 summarizes the experiments of the proposed method. Section 5 gives the conclusions.

Section snippets

Related works

Zero-shot learning aims to recognize new classes without labeled data. ZSL is a challenging problem that seen classes and unseen classes are non-overlap. The main idea is to associate seen and unseen classes by their shared semantic representations. The semantic space can be attributes [13], word vectors [4], text descriptions [14].

In general, many ZSL methods aim to find the shared embedding space between visual representations and semantic representations using seen information. Then unseen

Problem formulation

Formally, the task of GZSL is to recognize seen samples and unseen samples from target domain with the help of seen samples from source domain. For the source domain, a labeled dataset $X^{s} = {(x_{i}^{s})}_{i = 1}^{n_{s}}$ , their labels $Y^{s} = {(y_{i}^{s})}_{i = 1}^{n_{s}}$ and their semantic representations $A^{s} = {(a_{i}^{s})}_{i = 1}^{n_{s}}$ are given, in which $x_{i}^{s} \in R^{d}$ , $a_{i}^{s} \in R^{a}$ and $n_{s}$ is the number of seen samples. The source samples are all from seen classes $c \in {1, \dots, C}$ . For the target domain, an unlabeled dataset $X^{t} = {(x_{i}^{t})}_{i = 1}^{n_{t}}$ , in which $x_{i}^{t} \in R^{d}$ , and $n_{t}$ is

Datasets and experimental settings

We use the four benchmark datasets: AwA [3], aPY [2], CUB [52] and SUN [53]. The detailed settings of them in experiment are given in Table 1, which are the same with the settings in [1]. The image features are 101-layered ResNet features. These datasets under GZSL setting accord with the realistic GZSC problem. In ZSL, it is assumed that the test data are from unseen classes. We average the per unseen class accuracy for ZSC tasks. In GZSL, it is assumed that the test data are from seen or

Conclusion and future work

We have introduced a novel method for GZSC via iteratively generating and selecting. The main difficulty for GZSC tasks is that the unseen classes have no training samples. For the problem, we generate some high-quality unseen samples and select some confident unseen samples. We train a GAN model to generate seen and unseen samples, simultaneously. In order to generate high-quality unseen samples, the generated unseen samples are made near to the real unseen visual prototypes. The real unseen

CRediT authorship contribution statement

Xiao Li: Conceptualization, Data curation, Formal analysis, Software, Validation, Writing. Min Fang: Conceptualization, Data curation, Formal analysis, Supervision, Project administration, Resources. Bo Chen: Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant no. 61806155, China Postdoctoral Science Foundation funded project under Grant no. 2018M631125, National Natural Science Foundation of shaanxi province, China (Grant No. 2020GY-062, 2020JQ-323), Fundamental Research Funds for the Central Universities, China under Grant no. XJS200303, Natural Science Foundation of Anhui Province, China under Grant no. 1908085MF186.

References (54)

LiuJie et al.
Learning object-centric complementary features for zero-shot learning
Signal Process., Image Commun.
(2020)
LiXiao et al.
Learning unseen visual prototypes for zero-shot classification
Knowl.-Based Syst.
(2018)
ZhangHaofeng et al.
Adversarial unseen visual feature synthesis for zero-shot learning
Neurocomputing
(2019)
XianYongqin et al.
Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly
IEEE Trans. Pattern Anal. Mach. Intell.
(2018)
FarhadiAlireza et al.
Describing objects by their attributes
LampertChristoph H. et al.
Attribute-based classification for zero-shot visual object categorization
IEEE Trans. Pattern Anal. Mach. Intell.
(2014)
SocherRichard et al.
Zero-shot learning through cross-modal transfer
FuYanwei et al.
Transductive multi-view embedding for zero-shot recognition and annotation
YuYunlong et al.
Zero-shot learning via latent space encoding
IEEE Trans. Cybern.
(2018)
ChangpinyoSoravit et al.
Predicting visual exemplars of unseen classes for zero-shot learning

GoodfellowIan et al.

Generative adversarial nets

XianYongqin et al.

Feature generating networks for zero-shot learning

LiJingjing et al.

Leveraging the invariant side of generative zero-shot learning

GaoRui et al.

Zero-vae-gan: Generating unseen features for generalized and transductive zero-shot learning

IEEE Trans. Image Process.

(2020)

SongJie et al.

Transductive unbiased embedding for zero-shot learning

PalatucciMark et al.

Zero-shot learning with semantic output codes

ZhuYizhe et al.

A generative adversarial approach for zero-shot learning from noisy texts

AkataZeynep et al.

Attribute-based classification with label-embedding

Romera-ParedesBernardino et al.

An embarrassingly simple approach to zero-shot learning

KodirovElyor et al.

Semantic autoencoder for zero-shot learning

KodirovElyor et al.

Unsupervised domain adaptation for zero-shot learning

XuX. et al.

Matrix tri-factorization with manifold regularizations for zero-shot learning

DingZhengming et al.

Low-rank embedded ensemble semantic dictionary for zero-shot learning

JiZhong et al.

Attribute-guided network for cross-modal zero-shot hashing

IEEE Trans. Neural Netw. Learn. Syst.

(2019)

JiZhong et al.

Deep ranking for image zero-shot multi-label classification

IEEE Trans. Image Process.

(2020)

Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao, Attentive region embedding...

XieGuo-Sen et al.

Region graph embedding network for zero-shot learning

Cited by (2)

Learning complementary semantic information for zero-shot recognition
2023, Signal Processing: Image Communication
Accurate attribute representations are critically important in Zero-shot Learning (ZSL) as most ZSL methods need utilize the shared visual-semantic embedding to transfer knowledge from seen to unseen classes. However, many existing works directly recognize semantic attributes using a common image classification framework, which could fail since the differences of attribute representations among various images are ignored. We claim in this paper that attribute annotations contain complementary information that should be handled separately to better recognize them. To this end, our method consists of two branches: the Attribute Refinement by Localization (ARL) branch and the Visual-Semantic Interaction (VSI) branch. The ARL is used to refine the representations of tangible attribute information by channel selection and spatial suppression, which can more accurately localize the visual information relevant to an attribute. To effectively model the abstract attribute information, our VSI branch performs visual-semantic interaction by integrating the attribute prototypes into the visual features. By combining the proposed two branches, we can accurately model the complementary attribute information for ZSL. Extensive experiments are conducted on three benchmark datasets, and the results validate the effectiveness of our proposed method with considerable performance improvement over state-of-the-art methods.
Learning discriminative visual semantic embedding for zero-shot recognition
2023, Signal Processing: Image Communication
We present a novel zero-shot learning (ZSL) method that concentrates on strengthening the discriminative visual information of the semantic embedding space for recognizing object classes. To address the ZSL problem, many previous works strive to learn a transformation to bridge the visual features and semantic representations, while ignoring that the discriminative property of the semantic embedding space can benefit zero-shot prediction tasks. Among these existing approaches, human-defined attributes are typically employed to build up the mid-level semantics. However, the discriminative capability and completeness of manually defined attributes are hard to guarantee, which may easily cause semantic ambiguity. To alleviate this issue, we propose a discriminative visual semantic embedding (DVSE) model that formulates the ZSL problem as a supervised dictionary learning framework. The proposed method is capable of exploring a set of discriminative visual attributes and ensures knowledge transfer across categories. Moreover, a unified objective is introduced to generate an augmented semantic embedding space where these learned visual attributes and human-defined attributes are incorporated jointly for consolidating the visual cues of feature representations. Finally, we treat the DVSE model as an optimization problem and further propose an iterative solver. Extensive experiments on several challenging benchmark datasets demonstrate that the proposed method achieves favorable performances compared with state-of-the-art ZSL approaches.

View full text

Generalized zero-shot classification via iteratively generating and selecting unseen samples

Highlights

Abstract

Introduction

Section snippets

Related works

Problem formulation

Datasets and experimental settings

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Signal Process., Image Commun.

Knowl.-Based Syst.

Neurocomputing

Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly

IEEE Trans. Pattern Anal. Mach. Intell.

Describing objects by their attributes

Attribute-based classification for zero-shot visual object categorization

IEEE Trans. Pattern Anal. Mach. Intell.

Zero-shot learning through cross-modal transfer

Transductive multi-view embedding for zero-shot recognition and annotation

Zero-shot learning via latent space encoding

IEEE Trans. Cybern.

Predicting visual exemplars of unseen classes for zero-shot learning

Generative adversarial nets

Feature generating networks for zero-shot learning

Leveraging the invariant side of generative zero-shot learning

Zero-vae-gan: Generating unseen features for generalized and transductive zero-shot learning

IEEE Trans. Image Process.

Transductive unbiased embedding for zero-shot learning

Zero-shot learning with semantic output codes

A generative adversarial approach for zero-shot learning from noisy texts

Attribute-based classification with label-embedding

An embarrassingly simple approach to zero-shot learning

Semantic autoencoder for zero-shot learning

Unsupervised domain adaptation for zero-shot learning

Matrix tri-factorization with manifold regularizations for zero-shot learning

Low-rank embedded ensemble semantic dictionary for zero-shot learning

Attribute-guided network for cross-modal zero-shot hashing

IEEE Trans. Neural Netw. Learn. Syst.

Deep ranking for image zero-shot multi-label classification

IEEE Trans. Image Process.

Region graph embedding network for zero-shot learning