Generalized zero-shot classification via iteratively generating and selecting unseen samples
Introduction
With the rapid development of deep learning technique, image classification develops rapidly because of the huge amount of labeled data. However, the labeling process is expensive and boring. In addition, some samples of new classes are constantly emerging which lack training samples. For these problems, zero-shot learning (ZSL) [1] gains great attention. ZSL deals with the case that test classes are unavailable in training stage. The training classes are seen classes with large amount of labeled data. The test classes are unseen classes without labeled data which are disjoint with seen classes. Fortunately, the relationships of seen and unseen classes are available such as semantic attributes [2], [3], word vectors [4]. The semantic representations of seen (unseen) classes are called seen (unseen) class prototypes. Thus, we can transfer knowledge from seen classes to unseen classes by their shared semantic representations.
Many attempts have been made to deal with ZSL problem by projecting visual representations to the semantic space [3], [4]. The test samples can be projected into the semantic space and compared with the unseen class prototypes. The categories of them can be predicted as the class with the closest unseen class prototype. These methods suffer from the domain shift problem [5] because the projection function is learnt by seen samples which may not be suitable for the unseen samples. Some ZSL methods exploit the latent space for the classification of unseen samples in which visual and semantic representations are projected [5], [6]. The latent space is difficult to find. Some ZSL methods exploit the visual space for the classification of unseen samples in which unseen class prototypes are projected [7]. These methods achieve promising results on ZSL tasks. However, they cannot deal with the case that test set contains seen and unseen samples which is a more realistic situation. Generalized zero-shot classification (GZSC) [1] aims to recognize seen and unseen samples in target domain based on seen samples from source domain.
Recently, inspired by the GAN model [8], many generative ZSL methods [9], [10] have been proposed to generate unseen samples from semantic representations and noises. GZSC problem can be converted to a conventional classification problem with seen data and the generated unseen data. Existing generative ZSL models assume the generator which is able to generate high-quality seen data can generate high-quality unseen data. However, the generator trained on seen classes is not suitable for generating unseen samples. For handling with this bias problem, we train the generator by generating seen and unseen samples simultaneously. For generating high-quality unseen samples, the generated unseen samples are made near to their real unseen visual prototypes. The real unseen visual prototypes are unavailable since the unseen samples are missing. Many GZSL methods directly utilize the unlabeled unseen data [11], [12]. However, it is not reasonable to suppose the unlabeled data belonging to unseen classes is known. In GZSL, target samples are from seen or unseen classes. Instead of directly using the unlabeled unseen data, we consider there are seen and unseen data in the target domain. We select some confident unseen samples from target domain and assign them pseudo labels to update the unseen visual prototypes. Thus, we iteratively generate samples from the current GAN model and select some confident unseen samples based on the classifier trained by the generated unseen samples. Extensive experiments on four datasets for GZSL tasks reveal the superiority of the proposed method over the compared methods. The contributions are summarized as follows:
(1) We have proposed to iteratively generate unseen samples and select unseen samples from target domain. Through the iterative process, we can generate high-quality unseen samples and select the most confident unseen samples.
(2) In order to generate high-quality unseen samples, when training the GAN model, we restrict the fake visual prototypes of the generated unseen samples near to the real unseen visual prototypes. The real unseen visual prototypes are obtained by the selected confident unseen samples.
(3) Since the unseen samples are missing, the directly selection of unseen samples is difficult. We select the confident unseen samples from target domain by the agreement of two unseen classifiers. One classifier is based on the current unseen visual prototypes. Another classifier is trained with the generated unseen samples by the current GAN model.
(4) The final classification model is trained with the real seen samples, the generated unseen samples and the selected confident unseen samples. Compared to previous GZSC methods, our proposed method achieves state-of-the-art results on four widely used datasets.
The rest of this paper is as follows. Section 2 presents the related work in GZSL. Section 3 describes our generalized zero-shot classification via iteratively generating and selecting method. Section 4 summarizes the experiments of the proposed method. Section 5 gives the conclusions.
Section snippets
Related works
Zero-shot learning aims to recognize new classes without labeled data. ZSL is a challenging problem that seen classes and unseen classes are non-overlap. The main idea is to associate seen and unseen classes by their shared semantic representations. The semantic space can be attributes [13], word vectors [4], text descriptions [14].
In general, many ZSL methods aim to find the shared embedding space between visual representations and semantic representations using seen information. Then unseen
Problem formulation
Formally, the task of GZSL is to recognize seen samples and unseen samples from target domain with the help of seen samples from source domain. For the source domain, a labeled dataset , their labels and their semantic representations are given, in which , and is the number of seen samples. The source samples are all from seen classes . For the target domain, an unlabeled dataset , in which , and is
Datasets and experimental settings
We use the four benchmark datasets: AwA [3], aPY [2], CUB [52] and SUN [53]. The detailed settings of them in experiment are given in Table 1, which are the same with the settings in [1]. The image features are 101-layered ResNet features. These datasets under GZSL setting accord with the realistic GZSC problem. In ZSL, it is assumed that the test data are from unseen classes. We average the per unseen class accuracy for ZSC tasks. In GZSL, it is assumed that the test data are from seen or
Conclusion and future work
We have introduced a novel method for GZSC via iteratively generating and selecting. The main difficulty for GZSC tasks is that the unseen classes have no training samples. For the problem, we generate some high-quality unseen samples and select some confident unseen samples. We train a GAN model to generate seen and unseen samples, simultaneously. In order to generate high-quality unseen samples, the generated unseen samples are made near to the real unseen visual prototypes. The real unseen
CRediT authorship contribution statement
Xiao Li: Conceptualization, Data curation, Formal analysis, Software, Validation, Writing. Min Fang: Conceptualization, Data curation, Formal analysis, Supervision, Project administration, Resources. Bo Chen: Data curation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by National Natural Science Foundation of China under Grant no. 61806155, China Postdoctoral Science Foundation funded project under Grant no. 2018M631125, National Natural Science Foundation of shaanxi province, China (Grant No. 2020GY-062, 2020JQ-323), Fundamental Research Funds for the Central Universities, China under Grant no. XJS200303, Natural Science Foundation of Anhui Province, China under Grant no. 1908085MF186.
References (54)
- et al.
Learning object-centric complementary features for zero-shot learning
Signal Process., Image Commun.
(2020) - et al.
Learning unseen visual prototypes for zero-shot classification
Knowl.-Based Syst.
(2018) - et al.
Adversarial unseen visual feature synthesis for zero-shot learning
Neurocomputing
(2019) - et al.
Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
Describing objects by their attributes
- et al.
Attribute-based classification for zero-shot visual object categorization
IEEE Trans. Pattern Anal. Mach. Intell.
(2014) - et al.
Zero-shot learning through cross-modal transfer
- et al.
Transductive multi-view embedding for zero-shot recognition and annotation
- et al.
Zero-shot learning via latent space encoding
IEEE Trans. Cybern.
(2018) - et al.
Predicting visual exemplars of unseen classes for zero-shot learning
Generative adversarial nets
Feature generating networks for zero-shot learning
Leveraging the invariant side of generative zero-shot learning
Zero-vae-gan: Generating unseen features for generalized and transductive zero-shot learning
IEEE Trans. Image Process.
Transductive unbiased embedding for zero-shot learning
Zero-shot learning with semantic output codes
A generative adversarial approach for zero-shot learning from noisy texts
Attribute-based classification with label-embedding
An embarrassingly simple approach to zero-shot learning
Semantic autoencoder for zero-shot learning
Unsupervised domain adaptation for zero-shot learning
Matrix tri-factorization with manifold regularizations for zero-shot learning
Low-rank embedded ensemble semantic dictionary for zero-shot learning
Attribute-guided network for cross-modal zero-shot hashing
IEEE Trans. Neural Netw. Learn. Syst.
Deep ranking for image zero-shot multi-label classification
IEEE Trans. Image Process.
Region graph embedding network for zero-shot learning
Cited by (2)
Learning complementary semantic information for zero-shot recognition
2023, Signal Processing: Image CommunicationLearning discriminative visual semantic embedding for zero-shot recognition
2023, Signal Processing: Image Communication