Baby steps towards few-shot learning with multiple semantics

doi:10.1016/j.patrec.2022.06.012

Pattern Recognition Letters

Volume 160, August 2022, Pages 142-147

https://doi.org/10.1016/j.patrec.2022.06.012 Get rights and content

Highlights

•
We propose to consider a new, closer to ‘infant learning’, setting of Few-Shot Learning with Multiple and Complex Semantics.
•
In this context we propose a new benchmark for FSL-MCS, and an associated training and evaluation protocol.
•
A new multi-branch architecture that provides the first batch of encouraging results for the proposed FSL-MCS benchmark.

Abstract

Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible by combining multiple and richer semantics (category labels, attributes, and natural language descriptions). Using these ideas, we offer the community new results on the popular miniImageNet and CUB few-shot benchmarks, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach. Code available on github.com/EliSchwartz/mutiple-semantics.

Introduction

Modern day computer vision has experienced a tremendous leap due to the advent of deep learning (DL) techniques. The DL-based approaches reach higher levels of performance even compared to humans in tasks requiring expertise, such as recognizing dog breeds, or faces of thousands of celebrities. Yet, despite all the advances, some innate human abilities available to us at a very young age, still elude modern AI systems. One of these abilities is to be able to learn and later successfully recognize new, previously unseen, visual categories when presented to us with one or very few examples. This ‘few-shot learning’ task has been thoroughly explored in the computer vision literature and numerous approaches have been proposed, please see [1] for a review. Yet so far, the performance of even the best few-shot learning methods fall short by a significant margin from the performance of the fully supervised learning methods trained with a large number of examples, e.g. ImageNet [2]. It is challenging to adapt a model to novel classes based on few samples without over-fitting.

One key ingredient of human infant learning, which has only very recently found its way into the visual few-shot learning approaches, is the associated semantics that comes with the provided example. For example, it has been shown in the child development literature that infants’ object recognition ability is linked to their language skills and it is hypothesized that it might be related to the ability to describe objects [3]. Indeed, when a parent points a finger at a new category to be learned (‘look, here is a puppy’, Fig. 1), it is commonly accompanied by additional semantic references or descriptions for that category (e.g., ‘look at his nice fluffy ears’, ‘look at his nice silky fur’, ‘the puppy goes woof-woof’). This additional, and often rich, semantic information can be very useful to the learner, and has been exploited in the context of zero-shot learning and visual-semantic embeddings. Indeed, language as well as vision domains, both describe the same physical world in different ways, and in many cases contain useful complementary information that can be carried over to the learner in the other domain (visual to language and vice versa).

Only a handful of works have used semantics to facilitate few-shot learning in recent few-shot learning literature. Chen et al. [4] used an embedding vector of either the category label or of the given set of category attributes to regularize the latent representation of an auto-encoder by adding a loss for making the sample latent vector as close as possible to the corresponding semantic vector. In [5] the semantic representation of visual categories is learned on top of the GloVe [6] word embedding, jointly with a Proto-Net [7] based few-shot classifier, and jointly with the convex combination of both. The result of this joint training is a powerful few-shot and zero-shot (that is a semantic-based) ensemble that surpassed the performance of all other few-shot learning methods to-date on the challenging miniImageNet few-shot learning benchmark [8]. In both of these cases, combining few-shot learning with some category semantics (labels or attributes) proved highly beneficial to the performance of the few-shot learner. Yet in both cases, only the simple one word embedding or a set of several prescribed numerical attributes were used to encode the semantics.

In this work, we show that more can be gained by exploring a more realistic human-like learning setting. This is done by providing the learner access to multiple and richer semantics. Based on what is available for the dataset, these semantics can include: category labels; richer ‘description level’ semantic information (a sentence, or a few sentences, in a natural language with a description of the category); or attributes. We demonstrate how this learning with semantic setting can facilitate few-shot learning (leveraging the intuition of how human infants learn). The results compare favorably with previous visual and visual + semantics state-of-the-art results on the challenging miniImageNet [8] and CUB [9] few-shot benchmarks.

To summarize, the contributions of this work are three-fold. First, we propose the community to consider a new, perhaps closer to ‘infant learning’ setting of Few-Shot Learning with Multiple and Complex Semantics (FSL-MCS). Second, in this context we propose a new benchmark for FSL-MCS, and an associated training and evaluation protocol. Third, we propose a new multi-branch network architecture that provides the first batch of encouraging results for the proposed FSL-MCS setting benchmark.

Section snippets

Few-Shot learning

The major approaches to few-shot learning include: metric learning, meta learning (or learning-to-learn), and generative (or augmentation) based methods.

Few-shot learning by metric learning: These type of methods [7], [10] learn a non-linear embedding into a metric space where $L_{2}$ nearest neighbor (or similar) approach is used to classify instances of new categories according to their proximity to the few labeled training examples. Additional proposed variants include [11] that uses a metric

Method

Our general model architecture is summarized in Fig. 2. The model is comprised of a visual information branch supported by a CNN backbone computing features both for the training images of the few-shot task and for the query images. As in Proto-Nets [7], the feature vectors for each set of the task category support examples are averaged to form a visual prototype feature vector $V$ for that category. The visual prototype serves as the first estimation of the prototype $P_{0} = V$ . Then, the prototype is

Experimental results

We have evaluated our approach on the challenging few-shot benchmark of miniImageNet [8] used for evaluation by most (if not all) the few-shot learning works. We also evaluated on the CUB dataset [41] which includes another form of semantics, the attributes vector.

Summary & conclusions

In this work, we have proposed an extended approach for few-shot learning with additional semantic information. We suggest bringing few-shot learning with semantics closer to the setting used by human infants: we build on multiple semantic explanations (name, attributes and description) that accompany the few image examples and utilize more complex natural language based semantics rather than just the name of the category. In our experiments, we only touch the tip of the iceberg of the possible

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (44)

W.-Y. Chen
A closer look at few-shot classification
ICLR
(2018)
O. Russakovsky et al.
ImageNet large scale visual recognition challenge
IJCV
(2015)
L.B. Smith
Learning to recognize objects
Psychol. Sci.
(2003)
Z. Chen, Y. Fu, Y. Zhang, Y.-G. Jiang, X. Xue, L. Sigal, Semantic feature augmentation in few-shot learning,...
C. Xing et al.
Adaptive cross-modal few-shot learning
NeurIPS
(2019)
J. Pennington et al.
Glove: global vectors for word representation
EMNLP
(2014)
J. Snell et al.
Prototypical networks for few-shot learning
NIPS
(2017)
O. Vinyals et al.
Matching networks for one shot learning
NIPS
(2016)
P. Welinder et al.
Caltech-UCSD Birds 200
Technical Report CNS-TR-2010-001
(2010)
O. Rippel, M. Paluri, P. Dollar, L. Bourdev, Metric learning with adaptive density discrimination, arXiv preprint...

V. Garcia, J. Bruna, Few-Shot learning with graph neural networks, (2017) 1–13. arXiv preprint...

A. Santoro et al.

Meta-Learning with memory-Augmented neural networks

ICML

(2016)

F. Sung et al.

Learning to compare: Relation network for few-shot learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2018)

F. Hao et al.

Collect and select: semantic alignment metric learning for few-shot learning

Proceedings of the IEEE/CVF International Conference on Computer Vision

(2019)

X. Jiang et al.

Learning to learn with conditional class dependencies

ICLR

(2018)

Z. Ji et al.

Information symmetry matters: a modal-alternating propagation network for few-shot learning

IEEE Trans. Image Process.

(2022)

C. Finn, P. Abbeel, S. Levine, Model-Agnostic meta-learning for fast adaptation of deep networks, arXiv preprint...

Z. Li, F. Zhou, F. Chen, H. Li, Meta-SGD: learning to learn quickly for few-shot learning, arXiv preprint...

F. Zhou, B. Wu, Z. Li, Deep meta-learning: learning to learn in the concept space, (2018). arXiv preprint...

S. Ravi et al.

Optimization as a model for few-Shot learning

ICLR

(2017)

S. Doveh, E. Schwartz, C. Xue, R. Feris, A. Bronstein, R. Giryes, L. Karlinsky, MetAdapt: meta-learned task-adaptive...

A.A. Rusu et al.

Meta-Learning with latent embedding optimization

ICLR

(2018)

Cited by (30)

Bi-channel attention meta learning for few-shot fine-grained image recognition
2024, Expert Systems with Applications
Few-shot fine-grained recognition is an attractive research topic that aims to differentiate between sub-categories using a limited number of labeled examples. Due to the characteristics of fine-grained images, capturing subtle differences between categories using limited samples is very challenging. Discriminative information is essential for fine-grained image recognition, however, existing methods of few-shot learning usually extract features from each part indiscriminately, resulting in poor performance. To solve this problem, this work presents a compact Bi-channel Attention Meta-learning Model with an embedding module and a feature calibration module. The embedding module can effectively prevent the loss of crucial spatial information, thereby learning better deep descriptors. The feature calibration module consists of two sequentially arranged channel attention blocks, which allow the network selectively enhances discriminative features and compress less useful features with global information. Experiments on three commonly used fine-grained benchmark datasets indicate the efficacy and superiority of the proposed model.
Learning self-target knowledge for few-shot segmentation
2024, Pattern Recognition
Few-shot semantic segmentation uses a few annotated data of a specific class in the support set to segment the target of the same class in the query set. Most existing approaches fail to perform well when there are significant intra-class variances. This paper alleviates the problem by concentrating on mining the query image and using the support set as supplementary information. First, it proposes a Query Prototype Generation Module to generate a query foreground prototype from the query features. Specifically, we use both prototype-level and pixel-level similarity matching to generate two complementary initial prototypes, which we then integrate to create a discriminative query foreground prototype. Second, we propose a Support Auxiliary Refinement Module to further guide the final precise prediction of the query image by leveraging the target category information of the support set through step-by-step mining. Specifically, we generate a query-support mixture prototype based on the support prototype representation obtained using the attention mechanism. Then we generate a support supplement prototype to complement the missing information by encoding over the foreground regions that the query-support mixture prototype fails to segment out. Extensive experiments on PASCAL- $5^{i}$ and COCO- $2 0^{i}$ demonstrate that our model outperforms the prior works of few-shot segmentation.
Few-shot class-incremental audio classification via discriminative prototype learning
2023, Expert Systems with Applications
In real-world scenarios, new audio classes with insufficient samples usually emerge continually, which motivates the study of few-shot class-incremental audio classification (FCAC) in this paper. FCAC aims to enable the model to recognize new audio classes while remembering the base ones continually. To solve the FCAC problem, the discriminability of the prototypes is vital to the model’s classification performance. Thus, we proposed a method to learn the discriminative prototypes from two aspects. First, since the generalization ability of the embedding module (EM) significantly affects the discriminability of the prototypes, the proposed method employs a scheme of pseudo-episodic incremental training to train the EM by simulating the test scenario. Second, to enable the model to achieve a balanced classification performance on both base and new audio classes, the proposed method employs a selective-attention module to adjust different prototypes to enhance their discriminability. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance in solving the FCAC problem. Notably, the proposed method achieves a comprehensive performance score (CPS) of 87.82% and 59.25% on the Neural Synthesis musical notes of 100 classes (NSynth-100) and Free sound clips of 89 classes (FSC-89) datasets, respectively, which outperforms the comparison methods. Our code is available at https://github.com/chester-w-xie/DPL_FCAC.
Few-shot classification with task-adaptive semantic feature learning
2023, Pattern Recognition
Few-shot classification aims to learn a classifier that categorizes objects of unseen classes with limited samples. One general approach is to mine as much information as possible from limited samples. This can be achieved by incorporating data from multiple modalities. However, existing multi-modality methods only use additional modality in support samples while adhering to a single modal in query samples. Such approach could lead to information imbalance between support and query samples, which confounds model generalization from support to query samples. Towards this problem, we propose a task-adaptive semantic feature learning mechanism to incorporate semantic features for both support and query samples. The semantic feature learner is trained episodic-wisely by regressing from the feature vectors of the support samples. It is utilized to predict semantic features for the query samples. Such method maintains a consistent training scheme between support and query samples and enables direct model transfer from support to query data, which significantly improves model generalization. We conduct extensive experiments on four benchmarks in both inductive and transductive settings. Results show that the proposed TasNet outperforms state-of-the-art methods with an improvement of 1% to 5% in classification accuracy, demonstrating the superiority of our method. The exhaustive ablation studies further validate the effectiveness of our framework. The code is available at: https://github.com/pmhDL/TasNet
Hierarchical few-shot learning with feature fusion driven by data and knowledge
2023, Information Sciences
Few-shot learning (FSL) aims to use only a few samples to learn a model and utilizes that model to identify unseen classes. Recent, metric-based feature fusion methods mainly focus on the fusion of inter-layer features and show superior performance in solving FSL problems. However, due to the data scarcity in FSL, existing methods still face severe challenges in obtaining high-quality sample features for the improvement of classification performance. In this paper, we propose a hierarchical metric FSL model with comprehensive feature fusion driven by data and knowledge (HFFDK), which is based on intra-layer channel-feature and hierarchical class structure perspectives. First, we utilize the network hierarchy to construct an intra-layer channel feature fusion, which transfers the intra-layer fused features of the higher layer to the lower layer. The model can extract high-quality sample features in a data-driven manner. Moreover, we focus on different levels of granularity to obtain various levels of information, while hierarchical class structures can provide both coarse- and fine-grained information in a knowledge-driven manner. Then, we utilize the coarse-grained information to assist fine-grained recognition. Finally, we optimize hierarchical FSL with coarse- and fine-grained relational constraints and similarity measures among samples. Experiments on four benchmark datasets show that HFFDK achieves state-of-the-art performance.
Multimodal few-shot classification without attribute embedding
2024, Eurasip Journal on Image and Video Processing

View all citing articles on Scopus

¹: Equal contributors.

View full text

Baby steps towards few-shot learning with multiple semantics

Highlights

Abstract

Introduction

Section snippets

Few-Shot learning

Method

Experimental results

Summary & conclusions

Declaration of Competing Interest

A closer look at few-shot classification

ICLR

ImageNet large scale visual recognition challenge

IJCV

Learning to recognize objects

Psychol. Sci.

Adaptive cross-modal few-shot learning

NeurIPS

Glove: global vectors for word representation

EMNLP

Prototypical networks for few-shot learning

NIPS

Matching networks for one shot learning

NIPS

Caltech-UCSD Birds 200

Technical Report CNS-TR-2010-001

Meta-Learning with memory-Augmented neural networks

ICML

Learning to compare: Relation network for few-shot learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Collect and select: semantic alignment metric learning for few-shot learning

Proceedings of the IEEE/CVF International Conference on Computer Vision

Learning to learn with conditional class dependencies

ICLR

Information symmetry matters: a modal-alternating propagation network for few-shot learning

IEEE Trans. Image Process.

Optimization as a model for few-Shot learning

ICLR

Meta-Learning with latent embedding optimization

ICLR