PDA: Proxy-based domain adaptation for few-shot image recognition

doi:10.1016/j.imavis.2021.104164

Image and Vision Computing

Volume 110, June 2021, 104164

https://doi.org/10.1016/j.imavis.2021.104164 Get rights and content

Highlights

•
Performing domain adaptation over few annotated samples for improved few-shot image recognition.
•
Achieving task and domain transfer jointly.
•
Proxy-based domain adaptation scheme without accessing source data.
•
PDA optimizes the pre-trained representation and a novel few-shot classifier simultaneously.
•
Achieving state-of-the-art results on multiple few-shot image recognition benchmarks.

Abstract

Learning from limited supervision is a challenging problem that has recently attracted wide attention in the machine learning community. With scarce annotated samples available in target categories, so-called few-shot image recognition aims to transfer basic knowledge from a large-scale image set to recognize unseen classes. Many existing approaches tend to learn a general source data representation and apply it to address the few-shot task by building a target classifier on scare support features, which performs favorably only if source and target data distributions are similar. We argue that ignoring the distribution gap and directly leveraging frozen representations lead to a sub-optimal solution. Taking domain shift into consideration, we explore an efficient task adaptation strategy that can jointly achieve task and domain transfer. Accordingly, we propose a simple yet effective method, called proxy-based domain adaptation (PDA), to optimize the pre-trained representation and a target classifier simultaneously. PDA can be characterized as: (1) a source-data-independent approach that only leverages few support data from the target domain (2) a non-parametric adaptation method that performs model adaptation by minimizing a designed loss without involving any parametric modules additionally. We extensively conduct experiments on multiple few-shot image recognition benchmarks, highlighting the superiority of PDA over many SOTA methods. Besides, careful ablation studies verify each component's effectiveness in our method and demonstrate the significance of domain adaptation in few-shot image recognition.

Introduction

Image recognition has made tremendous progress in recent years. With the emergence of large-scale visual corpus [1], deep convolutional neural networks (CNNs) [2], [3] have shown powerful representation ability to recognize objects, compared to traditional handcraft feature descriptors. However, there are still many realistic scenarios that cannot obtain enough annotated data due to potential limitations, such as expensive annotation cost, data privacy, or just collection pipeline lack, which actually restricts applications of deep learning. Therefore, such a challenging task as learning from limited supervision has recently attracted wide attention in the machine learning community. Intuitively, with extremely scarce data, directly training a deep CNN model from scratch is prone to result in an overfitting dilemma and poor generalization performance. So recent few-shot learning approaches address the data scarcity problem under a transfer learning scheme, as illustrated in Fig. 1. With only a few annotated samples available in target categories, few-shot image recognition models aim to transfer general knowledge from a large-scale image set containing base classes with sufficient data to implement novel class categorization further.

Most existing methods of few-shot image recognition focus on learning a general deep CNN representation [4], [5], [6], [7], [8], [9], [10], [11] and transfer it to address the few-shot task by building a target classifier on the feature embeddings of scarce support data. Specifically, some recent works reveal that a good feature embedding [5], [9] makes the main contribution to improving few-shot generalization. Besides, other literature findings [7], [9] also indicate that the deeper features can boost the few-shot learning performance, as the target intra-class variation can be reduced [7], but not be removed. We refer to those methods that rely on domain-invariant representations without further task adaptation as in-domain few-shot learning approaches. However, the representations are essentially tailored to the source data distribution and transferred to performing well on a target task only if the source and target data distributions are very similar. Actually, most standard few-shot benchmarks respect the in-domain condition implicitly since source and target classes are sampled from the same canonical image recognition dataset, which ensures little domain discrepancy. Thus, generalization performance can be guaranteed potentially by learning a good representation with the sophisticated model and hyper-parameter selections.

To overcome the shortcoming of the generic few-shot benchmarks, Chen et al. [7] design a more practical evaluation setting where base and novel classes are sampled from different domains, referred to as Cross-Domain benchmark. As the empirical results demonstrate, current in-domain few-shot learning algorithms [6], [10], [11] fail to address the domain shifts and are even inferior to a naive finetuning model which is considered as a transfer learning baseline. This observation suggests that model adaptation may further benefit the generalization in the small-data regime.

In the previous studies, another research topic highly related to our motivation is domain adaptation (DA) [12], [13], [14], [15], [16]. Typically, the target data can be fully unlabeled [14], [15], [16] or sparsely available [12], [13]. However, unlike few-shot learning, most DA techniques aim to reduce the domain shifts within the same classification task, where the source and target classes are the same. In this case, the source and target data can share the same classifier. However, in the few-shot learning, the feature extractor trained on the source data is typically reused for performing task transfer, and the target classifier should be built from scratch with a few support data. We consider that domain discrepancy intrinsically exists in both generic and cross-domain few-shot learning settings when the pre-trained representations are applied to a target task.

In order to solve the above limitation that existed in the in-domain few-shot learning algorithms, we propose a proxy-based domain adaptation (PDA) scheme, which has the following novel characteristics: (1) A source-data-free DA. PDA performs domain adaptation without accessing source data, which is efficient and differs from most conventional domain adaptation settings. Besides, as few-shot learning holds the promises of data efficiency and fast adaptation [17], it is a favorable benefit to be independent of source data in the meta-testing stage. (2) A non-parametric DA. PDA performs task-adaptation on the target data with the designed loss, which effectively brings significant performance improvements without importing extra learnable parametric modules.

Concretely, given the support data, PDA targets to finetune the pre-trained representation and produce a target classifier by minimizing three objective losses: (1) A joint classification loss aims to improve model discriminability towards target classes. (2) A domain-level MMD loss utilizes the Maximum Mean Discrepancy (MMD) criterion for mitigating source and target domain discrepancy. (3) A class-level MMD loss alleviates the intra-class discrepancy of the novel categories of the target task.

Our main contributions are summarized as follows:

•
We explicitly consider and address the domain discrepancy in the few-shot image recognition, while it has seldom been concerned in the recent few-shot learning approaches.
•
To achieve the few-shot domain adaptation and avoid re-accessing source data in the meta-testing stage, we propose the PDA that simultaneously minimizes both target classification loss and MMD discrepancy at domain and class levels. Detailed ablation studies justify the effectiveness of each component of our designed loss.
•
PDA achieves state-of-the-art (SOTA) results on multiple few-shot image recognition benchmarks containing generic and cross-domain scenarios. In particular, for the cross-domain scenario, our domain adaptation strategy is more effective, leading to significant performance gains over the prior SOTA method by 9.17% and 6.09% for 1-shot and 5-shot settings, respectively.

Section snippets

Optimization-based few-shot learning

Most typical few-shot learning methods are developed based on meta-learning [18] in an episodic training manner, which devotes to design an optimization procedure over small-scale data that can quickly transfer knowledge from the meta-training stage to the meta-testing stage. One of the most impressive methods, MAML [17], aims to find an optimal model initialization on which only a few finetuning steps are needed for fast-adapting to a novel few-shot task. Meta-LSTM [19] follows the same idea

Methodology

In this section, we first give the problem formulation of few-shot image recognition and review the nearest neighbor classifier as a baseline method (Section 3.1). Then we introduce the pre-training routine with the source data (Section 3.2). Finally, we propose PDA as the domain adaptation approach to simultaneously optimize the pre-trained representation and a novel target classifier (Section 3.3).

Experiments and results

In this section, we first introduce the datasets and implementation details. Then we show extensive experimental results compared with previous few-shot learning methods. Our PDA works reasonably well on multiple few-shot image recognition benchmarks, including mini-ImageNet [19], tieredImageNet [38], and Cross-Domain [7] (mini-ImageNet [19] → CUB [39]).

Ablation studies

To understand the significance of each component in the PDA for few-shot image recognition, we design various experiments and analyze the findings in this subsection.

Conclusion

We propose a proxy-based domain adaptation (PDA) scheme for improving few-shot image recognition. Taking into account the domain shift in the transfer learning paradigm, PDA performs task-adaptation by simultaneously minimizing classification error and discrepancy at both domain and class levels, with respect to a few-shot image recognition task at hand. It is worth noticing that PDA does not need to access the source domain data and requires no additional parametric modules during adaptation,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (44)

O. Russakovsky et al.
Imagenet large scale visual recognition challenge
Int. J. Comput. Vis.
(2015)
A. Krizhevsky et al.
Imagenet classification with deep convolutional neural networks
K. He et al.
Deep residual learning for image recognition
Y. Chen et al.
A new meta-baseline for few-shot learning
(2020)
Y. Tian et al.
Rethinking few-shot image classification: A good embedding is all you need?
J. Snell et al.
Prototypical networks for few-shot learning
W.-Y. Chen et al.
A closer look at few-shot classification
B. Oreshkin et al.
Tadam: Task dependent adaptive metric for improved few-shot learning
Y. Wang et al.
Simpleshot: Revisiting nearest-neighbor classification for few-shot learning
arXiv
(2019)
O. Vinyals et al.
Matching networks for one shot learning

F. Sung et al.

Learning to compare: Relation network for few-shot learning

K. Saenko et al.

Adapting visual category models to new domains

E. Tzeng et al.

Deep domain confusion: Maximizing for domain invariance

arXiv

(2014)

Y. Ganin et al.

Unsupervised domain adaptation by backpropagation

M. Long et al.

Learning transferable features with deep adaptation networks

B. Sun et al.

Deep coral: Correlation alignment for deep domain adaptation

C. Finn et al.

Model-agnostic meta-learning for fast adaptation of deep networks

S. Thrun

Lifelong learning algorithms

S. Ravi et al.

Optimization as a model for few-shot learning

A.A. Rusu et al.

Meta-learning with latent embedding optimization

K. Lee et al.

Meta-learning with differentiable convex optimization

L. Bertinetto et al.

Meta-learning with differentiable closed-form solvers

Cited by (11)

Improving metric-based few-shot learning with dynamically scaled softmax loss
2023, Image and Vision Computing
The metric-based learning framework has been widely used in data-scarce few-shot visual classification. However, the current loss function limits the effectiveness of metric learning. One issue is that the nearest neighbor classification technique used greatly narrows the value range of similarity between the query and class prototypes, which limits the guiding ability of the loss function. The other issue is that the episode-based training setting randomizes the class combination in each iteration, which reduces the perception of the traditional softmax losses for effective learning from episodes with various data distributions.To solve these problems, we first review some variants of the softmax loss from a unified perspective, and then propose a novel Dynamically-Scaled Softmax Loss (DSSL). By adding a probability regulator (for scaling probabilities) and a loss regulator (for scaling losses), the loss function can adaptively adjust the prediction distribution and the training weights of the samples, which forces the model to focus on more informative samples. Finally, we found the proposed DSSL strategy for few-shot classifiers can achieve competitive results on four generic benchmarks and a fine-grained benchmark, demonstrating the effectiveness of improving the distinguishability (for base classes) and generalizability (for novel classes) of the learned feature space.
Manifold embedded joint geometrical and statistical alignment for visual domain adaptation
2022, Knowledge-Based Systems
Citation Excerpt :
Therefore, it is necessary to design a classifier to deal with the mismatch between differently distributed data. Techniques to diminish the distribution mismatch among training data and test data have been investigated under the names transfer learning [9–12], co-variate shift [13–15], and domain adaptation [16–20]. However, we pay more attention to domain adaptation techniques that are more relevant to our proposed approach.
Supervised learning algorithms like KNN assume that the training and the testing data come from the same source, and hence, have the same distribution. However, in real life, such instances are rare. Domain adaptation (DA) aims to model a classifier that is invariant to the different features of the training data. Domain invariant features are utilized to reduce distribution differences, ensuring an iterative improvement on the predicted target labels. Existing DA approaches either perform manifold subspace learning or align cross-domain distributions. At the same time, there exists a couple of imperative limitations: (1) degenerated feature transformation, wherein the seemingly domain invariant features are distorted during transformation (2) formulation of a robust objective function that must consider critical objectives in a unified framework to minimize deviations geometrically and statistically. To the best of our knowledge, none of the existing DA approaches address these two limitations simultaneously. Therefore, we propose a novel domain adaptation framework, called Manifold Embedded Joint Geometrical and Statistical Alignment (MEJGSA) for visual domain adaptation to address these limitations. MEJGSA first learns manifold features, and then formulates a robust objective function to reduce divergence between domains geometrically and statistically. We also consider more reliable classifiers to generate pseudo-labels to preserve objective functions such as conditional distributions and target domain discriminant information. Extensive experiments on two widely used cross-domain adaptation datasets demonstrate that MEJGSA exhibits significant improvements in classification accuracy as compared to cutting-edge shallow and deep DA approaches. The MATLAB code for our proposed method will be publicly available at ¹
Dual global-aware propagation for few-shot learning
2022, Image and Vision Computing
Citation Excerpt :
GIFSL [10] trained the network using filter-grafting along with an auxiliary self-supervision task and a knowledge distillation procedure to increase the representational capacity of the feature extraction network. PDA [11] proposed an efficient task adaptation strategy that can jointly achieve task and domain transfer to address the domain shift between the train and test set in few-shot learning. In gradient-based methods, MAML [12] trained a model which can generalize well on new tasks with a few iterations of fine-tuning.
Few-shot learning remains a challenging problem because it needs to classify unseen categories with only a few samples as limited supervision. The tasks and samples can be extremely different in various few-shot problems, which makes it even more difficult. Due to the local connectivity in CNN, it can not capture the global description of the samples and the features are not discriminative enough from a global viewpoint. Meanwhile, a sample usually reveals similar features in different tasks, which does not consider the global information of the task and weakens the discrimination of features. To address the above issue, we proposed a Dual Global-Aware method for label Propagation (DGAP) to encode two kinds of global description to enhance the discriminative power of the learned features. On the sample level, the global-aware sample module (GSM) is employed to get the contextual description and enhance the feature representation capability of each sample. On the task level, the global-aware task module (GTM) is used to embed the features in the current task to a more appropriate and discriminative position in the feature space which is task-oriented. In the end, a feature fusion module is adopted to combine the features obtained from both global sample and global task respects. Based on the label propagation method, the proposed DGAP improves the performance approximately 2–5% over the baseline on different benchmarks (mini-Imagenet and tiered-Imagenet) across different structures (Conv4 structure and ResNet12 structure), which reaches the state-of-the-art.
Cross attention redistribution with contrastive learning for few shot object detection
2022, Displays
Citation Excerpt :
It can be divided into three categories: metric learning methods, meta-learning methods and memory-based methods. Metric learning methods [11,41–43] predict whether the two images (or regions) belong to the same category by calculating the distance of the images (or regions) in the embedding space with feature representation. Meta-learning methods [13,17,44–49] use an independent training subtask to obtain a meta-model to the learned knowledge to process novel tasks with few-shot annotations.
Few-shot object detection aims to learn to detect novel objects from only a few annotated samples. Most training frameworks adopt the fusing of high-dimensional features with semantic information on the support images to learn the recognition and localization process of novel objects on the query images. Most prior works directly use a cross-correlation mechanism to integrate semantic information from support features. However, such operations will introduce noise to the query features, confusing the generation of region proposals and affecting the final localization precision. In this paper, we focus on sufficient mining and integrating the support features conducive to generating regional proposals to improve further the stability and accuracy of the few-shot object detector. We propose a cross-attention redistribution (CAReD) module to adaptively integrate support features into query features, effectively removing harmful support features and enhancing the regional features of novel categories. In addition, to classify the novel instances accurately, it is necessary to minimize the intra-class distance while maximizing the inter-class distance. To this end, this paper proposes a network training strategy based on contrastive learning, which can better supervise the training process of CAReD and, more importantly, can effectively improve the classification precision for bounding boxes. Experiments on Pascal VOC and MS-COCO datasets show that CAReD significantly improves upon two baseline detectors (+ 3.6% on Pascal VOC benchmark and + 4.4% on MS-COCO benchmark), achieving state-of-the-art results under few-shot detection settings.
Cacetr: A Framework for Boosting Class-Agnostic Counting with the Exemplar Query Attention Counting Transformer
2024, SSRN
A Novel Angular-Based Unsupervised Domain Adaptation Framework for Image Classification
2024, IEEE Transactions on Artificial Intelligence

View all citing articles on Scopus

View full text

PDA: Proxy-based domain adaptation for few-shot image recognition

Highlights

Abstract

Introduction

Section snippets

Optimization-based few-shot learning

Methodology

Experiments and results

Ablation studies

Conclusion

Declaration of Competing Interest

Imagenet large scale visual recognition challenge

Int. J. Comput. Vis.

Imagenet classification with deep convolutional neural networks

Deep residual learning for image recognition

A new meta-baseline for few-shot learning

Rethinking few-shot image classification: A good embedding is all you need?

Prototypical networks for few-shot learning

A closer look at few-shot classification

Tadam: Task dependent adaptive metric for improved few-shot learning

Simpleshot: Revisiting nearest-neighbor classification for few-shot learning

arXiv

Matching networks for one shot learning

Learning to compare: Relation network for few-shot learning

Adapting visual category models to new domains

Deep domain confusion: Maximizing for domain invariance

arXiv

Unsupervised domain adaptation by backpropagation

Learning transferable features with deep adaptation networks

Deep coral: Correlation alignment for deep domain adaptation

Model-agnostic meta-learning for fast adaptation of deep networks

Lifelong learning algorithms

Optimization as a model for few-shot learning

Meta-learning with latent embedding optimization

Meta-learning with differentiable convex optimization

Meta-learning with differentiable closed-form solvers