Local descriptor-based multi-prototype network for few-shot Learning
Introduction
In recent years, few-shot learning methods in image classification have achieved great success. They try to learn classifiers which could generalize well to the unseen classes. However, due to the scarcity of labeled data, it is difficult to directly train a supervised model with the limited labeled data. Inspired by that humans can learn quickly from few examples and transfer easily to recognize new objects, many machine learning algorithms have been proposed to tackle the few-shot problem, which aim to mimic humans’ generalization to identify new categories. Accordingly, a variety of excellent few-shot learning methods have been proposed in this field, which can be broadly categorized into three groups: metric-learning based [1], [2], [3], meta-learning based [4], [5], [6], and data-augmentation based [7], [8], [9] methods. In particular, because of the simplicity and effectiveness, metric-learning based methods have achieved great success in this field. In this paper, we will mainly focus on this kind of method.
The metric-learning problem [10] is concerned with learning a distance function tuned to a particular task. It has proven to be useful when nearest-neighbor methods and other techniques that rely on distances or similarities are employed in these tasks, such as Euclidean distance, cosine similarity, etc. Metric-learning based few-shot learning methods usually learn a well generalized deep embedding network, such as Prototypical Nets [3], Relation Net [11], etc. The prototype based few-shot learning methods’ main idea is to construct a prototype for each class, and then classify each query sample by comparing the similarity or distance between this sample and each prototype. However, the existing methods usually utilize only one single prototype to represent a class, which is not sufficient to represent a class’s commonly complicated distribution. On the other hand, it is generally difficult to learn an effective prototype due to the limited samples in each class. Besides, most of these methods are based on image-level feature representations. Such features may be insufficient in effectively representing classes. Simultaneously, rich information of local features is not effectively utilized. In this paper, we propose a novel Local descriptor-based Multi-Prototype Network (LMPNet) to address the above issues. An illustration of the procedure for our method is shown in Fig. 1. This figure gives a schematic view of the multiple local descriptor-based prototypes and how it differs from the existing single prototype-based representations.
The main contributions of this work are threefold: (1) Different from the traditional single prototype-based few-shot learning methods, e.g., Prototypical Nets and Relation Net, we propose a novel multi-prototype learning method by employing rich local descriptors instead of the normally adopted global features; (2) Instead of using an attention mechanism to learn a powerful global representation for a single image, we innovatively employ the channel squeeze and spatial excitation (sSE) attention module [12] to learn multiple local descriptor-based prototypes in a class; (3) The entire framework can be trained in an end-to-end manner. Our method outperforms some metric-based methods and achieves competitive performance over other meta-based methods on the miniImageNet and tieredImageNet. More importantly, our results generalized on Stanford Dogs, Stanford Cars, and CUB-200-2010 excel other techniques, showing the adequate fine-grained classification capacity of our method.
The rest of the paper is organized as follows. In Section 2, related work about few-shot learning is reviewed. Section 3 gives the definition of the few-shot learning problem. Next, we introduce our proposed LMPNet in Section 4 and analyze the experiments in Section 5. The last section is the conclusion.
Section snippets
Related work
The related work, including metric-learning based, meta-learning based, and data-augmentation based methods, will be reviewed in this section.
Problem definition
Few-shot learning problems typically involve three datasets: the Support Set, Query Set, and Auxiliary Set, which are denoted as and respectively. The support and query sets, which correspond to the training and testing sets in a generic classification task, share the same label space. If the support set contains classes and each class contains samples, this few-shot classification task is called a -way -shot task. However, in the support set each class typically comprises
Our method: local descriptor-based multi-prototype network (LMPNet)
The proposed LMPNet consists of three modules: a feature embedding module, a multi-prototype learning module, and a non-parametric metric classification module, as illustrated in Fig. 2. Both of the support and query images are first fed into the feature embedding module to obtain their corresponding local descriptor-based representations. To be specific, each image is represented as a set of local descriptors (see Section 4.1 for details). Different from the query image, each class of
Experiments
In this section, we perform extensive experiments on two common few-shot classification datasets, i.e., miniImageNet [14], tieredImageNet [5], and three fine-grained benchmark datasets, i.e., Stanford Dogs [26], Stanford Cars [27], and CUB-200-2010 [28], to evaluate the proposed LMPNet.
Conclusion
We have proposed the Local descriptor-based Multi-Prototype Network (LMPNet) to improve the prototype-based metric-learning methods for few-shot learning. Aiming to address the problem that single prototype based networks may not fully capture the feature information of a class, we utilize local descriptor based multiple prototypes in the feature embedding stage to mine more features of classes. Considering that most prototype-based methods obtain prototypes by a fixed mechanism that takes the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by the Science and Technology Innovation 2030 – “New Generation Artificial Intelligence” Major Project (No. 2018AAA0100900), National Science Foundation of China (No. 61806092), Jiangsu Natural Science Foundation (No. BK20180326) and the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Hongwei Huang received his Bachelor degree from the Department of Computer Science and Technology at Shanghai Jiaotong University in 2009. He is currently working towards his Ph.D. degree in the Department of Computer Science, Nanjing University. His research interests include machine learning and computer vision, particularly in metric learning, few-shot learning.
References (39)
- et al.
Distribution consistency based covariance metric networks for few-shot learning
AAAI
(2019) - et al.
TADAM: task dependent adaptive metric for improved few-shot learning
NIPS
(2018) - et al.
Prototypical networks for few-shot learning
NIPS
(2017) - et al.
Metagan: an adversarial approach to few-shot learning
NIPS
(2018) - et al.
Meta-learning for semi-supervised few-shot classification
ICLR
(2018) - et al.
Optimization as a model for few-shot learning
ICLR
(2017) - A. Antoniou, A. Storkey, H. Edwards, Data augmentation generative adversarial networks, arXiv...
- et al.
Delta-encoder: an effective sample synthesis method for few-shot object recognition
NIPS
(2018) - et al.
F-VAEGAN-D2: a feature generating framework for any-shot learning
CVPR
(2019) Metric learning: a survey
Found. Trends® Mach. Learn.
(2013)
Learning to compare: relation network for few-shot learning
CVPR
Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks
International Conference on Medical Image Computing and Computer-Assisted Intervention
Siamese neural networks for one-shot image recognition
ICML Deep Learning Workshop
Matching networks for one shot learning
NIPS
Few-shot learning through an information retrieval lens
NIPS
Few-shot learning with graph neural networks
ICLR
Scheduled sampling for one-shot learning via matching network
Pattern Recognit.
Meta-learning with memory-augmented neural networks
ICML
Model-agnostic meta-learning for fast adaptation of deep networks
ICML
Cited by (56)
Introspective GAN: Learning to grow a GAN for incremental generation and classification
2024, Pattern RecognitionQuery-centric distance modulator for few-shot classification
2024, Pattern RecognitionContrastive enhancement using latent prototype for few-shot segmentation
2024, Digital Signal Processing: A Review JournalAutonomous perception and adaptive standardization for few-shot learning
2023, Knowledge-Based SystemsKnowledge transduction for cross-domain few-shot learning
2023, Pattern Recognition
Hongwei Huang received his Bachelor degree from the Department of Computer Science and Technology at Shanghai Jiaotong University in 2009. He is currently working towards his Ph.D. degree in the Department of Computer Science, Nanjing University. His research interests include machine learning and computer vision, particularly in metric learning, few-shot learning.
Zhangkai Wu received his Master degree from the Software Institute at Nanjing University in 2020. He is currently working towards his Ph.D. degree in the Advanced Analytics Institute, University of Technology Sydney. His research interests include machine learning and analytics.
Wenbin Li received his Ph.D. degree from the Department of Computer Science and Technology at Nanjing University in 2019. He is currently an assistant researcher in the Department of Computer Science and Technology at Nanjing University, China. His research interests include machine learning and computer vision, particularly in metric learning, few-shot learning and their applications to face recognition and image classification.
Jing Huo received her Ph.D. degree from the Department of Computer Science and Technology at Nanjing University in 2017. She is currently an assistant researcher in the Department of Computer Science and Technology at Nanjing University, China. Her research interests are in machine learning and computer vision. Her work currently focuses on metric learning, subspace learning and their applications to heterogeneous face recognition.
Yang Gao received the Ph.D. degree from the Department of Computer Science and Technology, Nanjing University, China, in 2000. Currently, he is a Professor, and also the Deputy Director in the Department of Computer Science and Technology, Nanjing University. He is currently directing the Reasoning and Learning Research Group in Nanjing University. He has published more than 100 papers in top-tired conferences and journals. His current research interests include artificial intelligence and machine learning. He also serves as Program Chair and Area Chair for many international conferences.