PDA: Proxy-based domain adaptation for few-shot image recognition
Introduction
Image recognition has made tremendous progress in recent years. With the emergence of large-scale visual corpus [1], deep convolutional neural networks (CNNs) [2], [3] have shown powerful representation ability to recognize objects, compared to traditional handcraft feature descriptors. However, there are still many realistic scenarios that cannot obtain enough annotated data due to potential limitations, such as expensive annotation cost, data privacy, or just collection pipeline lack, which actually restricts applications of deep learning. Therefore, such a challenging task as learning from limited supervision has recently attracted wide attention in the machine learning community. Intuitively, with extremely scarce data, directly training a deep CNN model from scratch is prone to result in an overfitting dilemma and poor generalization performance. So recent few-shot learning approaches address the data scarcity problem under a transfer learning scheme, as illustrated in Fig. 1. With only a few annotated samples available in target categories, few-shot image recognition models aim to transfer general knowledge from a large-scale image set containing base classes with sufficient data to implement novel class categorization further.
Most existing methods of few-shot image recognition focus on learning a general deep CNN representation [4], [5], [6], [7], [8], [9], [10], [11] and transfer it to address the few-shot task by building a target classifier on the feature embeddings of scarce support data. Specifically, some recent works reveal that a good feature embedding [5], [9] makes the main contribution to improving few-shot generalization. Besides, other literature findings [7], [9] also indicate that the deeper features can boost the few-shot learning performance, as the target intra-class variation can be reduced [7], but not be removed. We refer to those methods that rely on domain-invariant representations without further task adaptation as in-domain few-shot learning approaches. However, the representations are essentially tailored to the source data distribution and transferred to performing well on a target task only if the source and target data distributions are very similar. Actually, most standard few-shot benchmarks respect the in-domain condition implicitly since source and target classes are sampled from the same canonical image recognition dataset, which ensures little domain discrepancy. Thus, generalization performance can be guaranteed potentially by learning a good representation with the sophisticated model and hyper-parameter selections.
To overcome the shortcoming of the generic few-shot benchmarks, Chen et al. [7] design a more practical evaluation setting where base and novel classes are sampled from different domains, referred to as Cross-Domain benchmark. As the empirical results demonstrate, current in-domain few-shot learning algorithms [6], [10], [11] fail to address the domain shifts and are even inferior to a naive finetuning model which is considered as a transfer learning baseline. This observation suggests that model adaptation may further benefit the generalization in the small-data regime.
In the previous studies, another research topic highly related to our motivation is domain adaptation (DA) [12], [13], [14], [15], [16]. Typically, the target data can be fully unlabeled [14], [15], [16] or sparsely available [12], [13]. However, unlike few-shot learning, most DA techniques aim to reduce the domain shifts within the same classification task, where the source and target classes are the same. In this case, the source and target data can share the same classifier. However, in the few-shot learning, the feature extractor trained on the source data is typically reused for performing task transfer, and the target classifier should be built from scratch with a few support data. We consider that domain discrepancy intrinsically exists in both generic and cross-domain few-shot learning settings when the pre-trained representations are applied to a target task.
In order to solve the above limitation that existed in the in-domain few-shot learning algorithms, we propose a proxy-based domain adaptation (PDA) scheme, which has the following novel characteristics: (1) A source-data-free DA. PDA performs domain adaptation without accessing source data, which is efficient and differs from most conventional domain adaptation settings. Besides, as few-shot learning holds the promises of data efficiency and fast adaptation [17], it is a favorable benefit to be independent of source data in the meta-testing stage. (2) A non-parametric DA. PDA performs task-adaptation on the target data with the designed loss, which effectively brings significant performance improvements without importing extra learnable parametric modules.
Concretely, given the support data, PDA targets to finetune the pre-trained representation and produce a target classifier by minimizing three objective losses: (1) A joint classification loss aims to improve model discriminability towards target classes. (2) A domain-level MMD loss utilizes the Maximum Mean Discrepancy (MMD) criterion for mitigating source and target domain discrepancy. (3) A class-level MMD loss alleviates the intra-class discrepancy of the novel categories of the target task.
Our main contributions are summarized as follows:
- •
We explicitly consider and address the domain discrepancy in the few-shot image recognition, while it has seldom been concerned in the recent few-shot learning approaches.
- •
To achieve the few-shot domain adaptation and avoid re-accessing source data in the meta-testing stage, we propose the PDA that simultaneously minimizes both target classification loss and MMD discrepancy at domain and class levels. Detailed ablation studies justify the effectiveness of each component of our designed loss.
- •
PDA achieves state-of-the-art (SOTA) results on multiple few-shot image recognition benchmarks containing generic and cross-domain scenarios. In particular, for the cross-domain scenario, our domain adaptation strategy is more effective, leading to significant performance gains over the prior SOTA method by 9.17% and 6.09% for 1-shot and 5-shot settings, respectively.
Section snippets
Optimization-based few-shot learning
Most typical few-shot learning methods are developed based on meta-learning [18] in an episodic training manner, which devotes to design an optimization procedure over small-scale data that can quickly transfer knowledge from the meta-training stage to the meta-testing stage. One of the most impressive methods, MAML [17], aims to find an optimal model initialization on which only a few finetuning steps are needed for fast-adapting to a novel few-shot task. Meta-LSTM [19] follows the same idea
Methodology
In this section, we first give the problem formulation of few-shot image recognition and review the nearest neighbor classifier as a baseline method (Section 3.1). Then we introduce the pre-training routine with the source data (Section 3.2). Finally, we propose PDA as the domain adaptation approach to simultaneously optimize the pre-trained representation and a novel target classifier (Section 3.3).
Experiments and results
In this section, we first introduce the datasets and implementation details. Then we show extensive experimental results compared with previous few-shot learning methods. Our PDA works reasonably well on multiple few-shot image recognition benchmarks, including mini-ImageNet [19], tieredImageNet [38], and Cross-Domain [7] (mini-ImageNet [19] → CUB [39]).
Ablation studies
To understand the significance of each component in the PDA for few-shot image recognition, we design various experiments and analyze the findings in this subsection.
Conclusion
We propose a proxy-based domain adaptation (PDA) scheme for improving few-shot image recognition. Taking into account the domain shift in the transfer learning paradigm, PDA performs task-adaptation by simultaneously minimizing classification error and discrepancy at both domain and class levels, with respect to a few-shot image recognition task at hand. It is worth noticing that PDA does not need to access the source domain data and requires no additional parametric modules during adaptation,
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (44)
- et al.
Imagenet large scale visual recognition challenge
Int. J. Comput. Vis.
(2015) - et al.
Imagenet classification with deep convolutional neural networks
- et al.
Deep residual learning for image recognition
- et al.
A new meta-baseline for few-shot learning
(2020) - et al.
Rethinking few-shot image classification: A good embedding is all you need?
- et al.
Prototypical networks for few-shot learning
- et al.
A closer look at few-shot classification
- et al.
Tadam: Task dependent adaptive metric for improved few-shot learning
- et al.
Simpleshot: Revisiting nearest-neighbor classification for few-shot learning
arXiv
(2019) - et al.
Matching networks for one shot learning
Learning to compare: Relation network for few-shot learning
Adapting visual category models to new domains
Deep domain confusion: Maximizing for domain invariance
arXiv
Unsupervised domain adaptation by backpropagation
Learning transferable features with deep adaptation networks
Deep coral: Correlation alignment for deep domain adaptation
Model-agnostic meta-learning for fast adaptation of deep networks
Lifelong learning algorithms
Optimization as a model for few-shot learning
Meta-learning with latent embedding optimization
Meta-learning with differentiable convex optimization
Meta-learning with differentiable closed-form solvers
Cited by (11)
Improving metric-based few-shot learning with dynamically scaled softmax loss
2023, Image and Vision ComputingManifold embedded joint geometrical and statistical alignment for visual domain adaptation
2022, Knowledge-Based SystemsCitation Excerpt :Therefore, it is necessary to design a classifier to deal with the mismatch between differently distributed data. Techniques to diminish the distribution mismatch among training data and test data have been investigated under the names transfer learning [9–12], co-variate shift [13–15], and domain adaptation [16–20]. However, we pay more attention to domain adaptation techniques that are more relevant to our proposed approach.
Dual global-aware propagation for few-shot learning
2022, Image and Vision ComputingCitation Excerpt :GIFSL [10] trained the network using filter-grafting along with an auxiliary self-supervision task and a knowledge distillation procedure to increase the representational capacity of the feature extraction network. PDA [11] proposed an efficient task adaptation strategy that can jointly achieve task and domain transfer to address the domain shift between the train and test set in few-shot learning. In gradient-based methods, MAML [12] trained a model which can generalize well on new tasks with a few iterations of fine-tuning.
Cross attention redistribution with contrastive learning for few shot object detection
2022, DisplaysCitation Excerpt :It can be divided into three categories: metric learning methods, meta-learning methods and memory-based methods. Metric learning methods [11,41–43] predict whether the two images (or regions) belong to the same category by calculating the distance of the images (or regions) in the embedding space with feature representation. Meta-learning methods [13,17,44–49] use an independent training subtask to obtain a meta-model to the learned knowledge to process novel tasks with few-shot annotations.
A Novel Angular-Based Unsupervised Domain Adaptation Framework for Image Classification
2024, IEEE Transactions on Artificial Intelligence