Elsevier

Pattern Recognition

Volume 116, August 2021, 107935
Pattern Recognition

Local descriptor-based multi-prototype network for few-shot Learning

https://doi.org/10.1016/j.patcog.2021.107935Get rights and content

Highlights

  • We present a simple method to effectively approximate the underlying distribution of a class by using multiple prototype learning.

  • We use the channel squeeze and spatial excitation mechanism to selectively emphasise informative local descriptors and suppress less useful ones.

  • Our method outperforms some metric-based FSL methods and achieves competitive performance over other meta-based methods on multiple benchmarks.

Abstract

Prototype-based few-shot learning methods are promising in that they are simple yet effective to handle any-shot problems, and many prototype associated works are raised since then. However, these traditional prototype-based methods generally use only one single prototype to represent a class, which essentially cannot effectively estimate the complicated distribution of a class. To tackle this problem, we propose a novel Local descriptor-based Multi-Prototype Network (LMPNet) in this paper, a well-designed framework that generates an embedding space with multiple prototypes. Specifically, the proposed LMPNet employs local descriptors to represent each image, which can capture more informative and subtler cues of an image than the normally adopted image-level features. Moreover, to alleviate the uncertainty introduced by the fixed construction (averaging over samples) of prototypes, we introduce a channel squeeze and spatial excitation (sSE) attention module to learn multiple local descriptor-based prototypes for each class through end-to-end learning. Extensive experiments on both few-shot and fine-grained few-shot image classification tasks have been conducted on various benchmark datasets, including miniImageNet, tieredImageNet, Stanford Dogs, Stanford Cars, and CUB-200-2010. The experimental results of our LMPNet on above datasets show tangibly learning performance improvements and distinguishable outcomes over the baseline models.

Introduction

In recent years, few-shot learning methods in image classification have achieved great success. They try to learn classifiers which could generalize well to the unseen classes. However, due to the scarcity of labeled data, it is difficult to directly train a supervised model with the limited labeled data. Inspired by that humans can learn quickly from few examples and transfer easily to recognize new objects, many machine learning algorithms have been proposed to tackle the few-shot problem, which aim to mimic humans’ generalization to identify new categories. Accordingly, a variety of excellent few-shot learning methods have been proposed in this field, which can be broadly categorized into three groups: metric-learning based [1], [2], [3], meta-learning based [4], [5], [6], and data-augmentation based [7], [8], [9] methods. In particular, because of the simplicity and effectiveness, metric-learning based methods have achieved great success in this field. In this paper, we will mainly focus on this kind of method.

The metric-learning problem [10] is concerned with learning a distance function tuned to a particular task. It has proven to be useful when nearest-neighbor methods and other techniques that rely on distances or similarities are employed in these tasks, such as Euclidean distance, cosine similarity, etc. Metric-learning based few-shot learning methods usually learn a well generalized deep embedding network, such as Prototypical Nets [3], Relation Net [11], etc. The prototype based few-shot learning methods’ main idea is to construct a prototype for each class, and then classify each query sample by comparing the similarity or distance between this sample and each prototype. However, the existing methods usually utilize only one single prototype to represent a class, which is not sufficient to represent a class’s commonly complicated distribution. On the other hand, it is generally difficult to learn an effective prototype due to the limited samples in each class. Besides, most of these methods are based on image-level feature representations. Such features may be insufficient in effectively representing classes. Simultaneously, rich information of local features is not effectively utilized. In this paper, we propose a novel Local descriptor-based Multi-Prototype Network (LMPNet) to address the above issues. An illustration of the procedure for our method is shown in Fig. 1. This figure gives a schematic view of the multiple local descriptor-based prototypes and how it differs from the existing single prototype-based representations.

The main contributions of this work are threefold: (1) Different from the traditional single prototype-based few-shot learning methods, e.g., Prototypical Nets and Relation Net, we propose a novel multi-prototype learning method by employing rich local descriptors instead of the normally adopted global features; (2) Instead of using an attention mechanism to learn a powerful global representation for a single image, we innovatively employ the channel squeeze and spatial excitation (sSE) attention module [12] to learn multiple local descriptor-based prototypes in a class; (3) The entire framework can be trained in an end-to-end manner. Our method outperforms some metric-based methods and achieves competitive performance over other meta-based methods on the miniImageNet and tieredImageNet. More importantly, our results generalized on Stanford Dogs, Stanford Cars, and CUB-200-2010 excel other techniques, showing the adequate fine-grained classification capacity of our method.

The rest of the paper is organized as follows. In Section 2, related work about few-shot learning is reviewed. Section 3 gives the definition of the few-shot learning problem. Next, we introduce our proposed LMPNet in Section 4 and analyze the experiments in Section 5. The last section is the conclusion.

Section snippets

Related work

The related work, including metric-learning based, meta-learning based, and data-augmentation based methods, will be reviewed in this section.

Problem definition

Few-shot learning problems typically involve three datasets: the Support Set, Query Set, and Auxiliary Set, which are denoted as S, Q, and A, respectively. The support and query sets, which correspond to the training and testing sets in a generic classification task, share the same label space. If the support set S contains N classes and each class contains K samples, this few-shot classification task is called a N-way K-shot task. However, in the support set S, each class typically comprises

Our method: local descriptor-based multi-prototype network (LMPNet)

The proposed LMPNet consists of three modules: a feature embedding module, a multi-prototype learning module, and a non-parametric metric classification module, as illustrated in Fig. 2. Both of the support and query images are first fed into the feature embedding module fφ to obtain their corresponding local descriptor-based representations. To be specific, each image is represented as a set of local descriptors (see Section 4.1 for details). Different from the query image, each class Si of

Experiments

In this section, we perform extensive experiments on two common few-shot classification datasets, i.e., miniImageNet [14], tieredImageNet [5], and three fine-grained benchmark datasets, i.e., Stanford Dogs [26], Stanford Cars [27], and CUB-200-2010 [28], to evaluate the proposed LMPNet.

Conclusion

We have proposed the Local descriptor-based Multi-Prototype Network (LMPNet) to improve the prototype-based metric-learning methods for few-shot learning. Aiming to address the problem that single prototype based networks may not fully capture the feature information of a class, we utilize local descriptor based multiple prototypes in the feature embedding stage to mine more features of classes. Considering that most prototype-based methods obtain prototypes by a fixed mechanism that takes the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the Science and Technology Innovation 2030 – “New Generation Artificial Intelligence” Major Project (No. 2018AAA0100900), National Science Foundation of China (No. 61806092), Jiangsu Natural Science Foundation (No. BK20180326) and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Hongwei Huang received his Bachelor degree from the Department of Computer Science and Technology at Shanghai Jiaotong University in 2009. He is currently working towards his Ph.D. degree in the Department of Computer Science, Nanjing University. His research interests include machine learning and computer vision, particularly in metric learning, few-shot learning.

References (39)

  • W. Li et al.

    Distribution consistency based covariance metric networks for few-shot learning

    AAAI

    (2019)
  • B.N. Oreshkin et al.

    TADAM: task dependent adaptive metric for improved few-shot learning

    NIPS

    (2018)
  • J. Snell et al.

    Prototypical networks for few-shot learning

    NIPS

    (2017)
  • R. Zhang et al.

    Metagan: an adversarial approach to few-shot learning

    NIPS

    (2018)
  • M. Ren et al.

    Meta-learning for semi-supervised few-shot classification

    ICLR

    (2018)
  • S. Ravi et al.

    Optimization as a model for few-shot learning

    ICLR

    (2017)
  • A. Antoniou, A. Storkey, H. Edwards, Data augmentation generative adversarial networks, arXiv...
  • E. Schwartz et al.

    Delta-encoder: an effective sample synthesis method for few-shot object recognition

    NIPS

    (2018)
  • Y. Xian et al.

    F-VAEGAN-D2: a feature generating framework for any-shot learning

    CVPR

    (2019)
  • B. Kulis

    Metric learning: a survey

    Found. Trends® Mach. Learn.

    (2013)
  • F. Sung et al.

    Learning to compare: relation network for few-shot learning

    CVPR

    (2018)
  • A.G. Roy et al.

    Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2018)
  • G. Koch et al.

    Siamese neural networks for one-shot image recognition

    ICML Deep Learning Workshop

    (2015)
  • O. Vinyals et al.

    Matching networks for one shot learning

    NIPS

    (2016)
  • E. Triantafillou et al.

    Few-shot learning through an information retrieval lens

    NIPS

    (2017)
  • V.G. Satorras et al.

    Few-shot learning with graph neural networks

    ICLR

    (2018)
  • L. Zhang et al.

    Scheduled sampling for one-shot learning via matching network

    Pattern Recognit.

    (2019)
  • A. Santoro et al.

    Meta-learning with memory-augmented neural networks

    ICML

    (2016)
  • C. Finn et al.

    Model-agnostic meta-learning for fast adaptation of deep networks

    ICML

    (2017)
  • Cited by (56)

    • Contrastive enhancement using latent prototype for few-shot segmentation

      2024, Digital Signal Processing: A Review Journal
    View all citing articles on Scopus

    Hongwei Huang received his Bachelor degree from the Department of Computer Science and Technology at Shanghai Jiaotong University in 2009. He is currently working towards his Ph.D. degree in the Department of Computer Science, Nanjing University. His research interests include machine learning and computer vision, particularly in metric learning, few-shot learning.

    Zhangkai Wu received his Master degree from the Software Institute at Nanjing University in 2020. He is currently working towards his Ph.D. degree in the Advanced Analytics Institute, University of Technology Sydney. His research interests include machine learning and analytics.

    Wenbin Li received his Ph.D. degree from the Department of Computer Science and Technology at Nanjing University in 2019. He is currently an assistant researcher in the Department of Computer Science and Technology at Nanjing University, China. His research interests include machine learning and computer vision, particularly in metric learning, few-shot learning and their applications to face recognition and image classification.

    Jing Huo received her Ph.D. degree from the Department of Computer Science and Technology at Nanjing University in 2017. She is currently an assistant researcher in the Department of Computer Science and Technology at Nanjing University, China. Her research interests are in machine learning and computer vision. Her work currently focuses on metric learning, subspace learning and their applications to heterogeneous face recognition.

    Yang Gao received the Ph.D. degree from the Department of Computer Science and Technology, Nanjing University, China, in 2000. Currently, he is a Professor, and also the Deputy Director in the Department of Computer Science and Technology, Nanjing University. He is currently directing the Reasoning and Learning Research Group in Nanjing University. He has published more than 100 papers in top-tired conferences and journals. His current research interests include artificial intelligence and machine learning. He also serves as Program Chair and Area Chair for many international conferences.

    1

    Equal contribution.

    2

    Joint corresponding author.

    View full text