Discriminative and informative joint distribution adaptation for unsupervised domain adaptation
Introduction
The traditional supervised learning paradigms perform well under two assumptions: (1) there are enough labeled samples to guarantee the model to be well-trained; (2) the training and test data are subjected to the independent and identical distribution (I.I.D) [1]. However, these two assumptions do not always hold in many practical applications. On the one hand, labeled samples are generally scarce, especially in new domains as collecting samples and annotating them require the expensive and time-consuming human labor [2]. On the other hand, the new data cannot be guaranteed to have the same distribution as the already existed labeled ones due to various factors, such as the differences in resolution, illumination, and others [3]. For example, a face recognition system trained on high resolution laboratory images is sometimes applied to recognize low-resolution and noisy surveillance images. In such situations, the performance is inevitably degraded.
Although the distribution divergency may occur between the existing labeled and new target samples, some common factors are shared by these samples. Obviously, these labeled samples can provide some useful knowledge to improve the learning efficiency of the target data. If we learn the classifier through manually annotating those target data from scratch, it will take much time and cost. In other words, learning a well-performing model for the new data without using the information gained from the previously related labeled samples will be inefficient. How to effectively utilize these available labeled data to promote the learning of the newly unlabeled samples becomes an urgent problem. As a result, transfer learning has emerged as the solution with the objective of “borrowing” well-learned knowledge from the auxiliary data and applying it to the related target data. Depending on different situations between the source and target domains as well as tasks, transfer learning can be generally grouped into four categories: multi-task learning, self-taught learning, domain adaptation learning and unsupervised transfer learning [4].
As an active branch of transfer learning, domain adaptation learning has attracted increasing research attention in recent years and has been successfully applied on numerous applications, including but not limited to image classification [5], [6], video concept detection [7], [8], object recognition [9], [10], and action recognition [11], [12]. According to the survey [4], domain adaptation learning can be categorized into semi-supervised setting in which the target domain has very few labeled samples but sufficient unlabeled ones [13], [14], [15] and unsupervised setting in which the target data are fully unlabeled [16], [17], [18], [19]. Particularly, this paper focuses on unsupervised domain adaptation as the target samples without labels are often common in real-world applications. In consideration of feature spaces of the source domain and the target domain, the unsupervised domain adaptation can be further divided into homogeneous and heterogeneous domain adaptation. In the homogeneous domain adaptation problems, the feature spaces between the source and target domains are identical with the same dimension [17], [18], [19]. Hence, the source and target data are generally different in terms of distributions. While in the heterogeneous domain adaptation problems, the source and target domain data are characterized by different sets of features, and the dimensions may also generally differ [16], [20]. Many existing domain adaptation approaches focus on cross-domain learning problems of homogeneous features. Furthermore, according to the difference of the transferred contents, the homogeneous domain adaptation methods are divided into three subcategories [4]. The first category is instance-based adaptation, which is motivated by importance sampling. In this method, the divergence of distributions is mitigated by reweighting or selecting source samples in the light of their importance [21], [22]. The second type is model-based adaptation, which uses the source domain data to build a credible classifier such that the model parameters can be adapted to the target domain [23], [24]. And the last one is feature representation-based adaptation, which aims to learn a domain-invariant feature representation such that the distributions of both domains are the same or very similar [25], [26], [27]. Once the new feature representations of data are obtained, one can adopt any supervised learning algorithm for subsequent classification, regression, or clustering.
Nowadays, many scholars have shown their great interests in the feature representation type approach since it is computationally simple and can be generalized to out-of-sample patterns. Moreover, this type approach can flexibly introduce various meaningful regularizations to improve its interpretability and performance. In this paper, based on these admirable properties, we attempt to study feature representation adaptation for dealing with the inconsistency of the data distributions. By reviewing the existing feature based domain adaptation work, the motivations behind the proposed method are as follows: (1) These methods usually reduce the gap between the source and target domains by the way of feature transformation. However, samples from the different classes may be mixed together after transformation. This would lead to the degradation of the learning performance. Therefore, the transformed feature representation needs to be not only discriminative but also separable enough. (2) In many fields, data are normally represented by high-dimensional feature vectors, which inevitably brings in irrelevant information. The information may interfere with the classifier and decrease the classification accuracy. This naturally suggests us to identify the most representative feature attributes for transferring source knowledge to target more successfully.
With these motivations, the paper proposes a discriminative and informative joint distribution adaptation method based on the maximum margin criterion and row-sparsity technique (DIJDA for short). To be specific, DIJDA looks for a low-dimensional subspace shared by the involved domains where the distribution divergency across domains is effectively mitigated by maximum mean discrepancy (MMD) [28]. Meanwhile, DIJDA fully exploits the label information of the source domain such that features in this subspace are not only discriminative but also separable enough. Different from some previous domain adaptation methods that fit the input features to the corresponding labels, DIJDA encourages the transformed source samples with the same label to form a compact cluster, and different clusters with different labels to be far away from each other by means of the maximum margin criterion (MMC) [29]. That is to say, the distances of the transformed samples from the same class are decreased, and those of the transformed samples from different classes are increased. Moreover, transformed samples should be close to each other in the shared subspace if they are from the same class, regardless of which domain they originally belong to. To this end, DIJDA introduces a regularization term based on manifold assumption [30] to characterize the local structure of data such that the space relationship of samples in the low-dimensional space is preserved. Furthermore, to emphasize the roles of important features and lessen those of irrelevant features, DIJDA is enlightened by the sparsity technique [31] and imposes a row-sparsity constraint on the transformation matrix. This constraint forces entire rows of the transformation matrix corresponding to inessential features towards zero elements. Thus, DIJDA is capable of identifying informative feature attributes, which helps improve the learning performance.
In brief, the proposed method has the following advantages:
The proposed method introduces the maximum margin criterion into the domain adaptation learning, which makes the margins of samples from the same class be reduced while those of samples from different classes be enlarged. In this way, samples with the same label form a compact cluster, and different class clusters are as far away as possible.
The proposed method imposes a row-sparsity constraint on the transformation matrix, which shrinks the rows corresponding to inessential features in transformation matrix to zeros. Thus, the informative feature attributes are identified, and this naturally leads to the improvement of the learning performance.
The extensive experiments on five benchmark datasetsdemonstrate the superiority of the proposed method in comparison with three traditional machine learning methods and several state-of-the-art domain adaptation methods (including deep learning methods on Office-31and Office-Caltech-10 datasets).
The remainder of this paper is arranged as follows: Section 2 introduces some related work about domain adaptation. In Section 3, we present the formulation of the proposed model and its optimization algorithm, followed by the convergence and computational complexity of the algorithm. The extensive experiment results on several image benchmark datasets are presented and analyzed in Section 4. Finally, conclusions and future work are summarized in Section 5.
Section snippets
Related work
In this section, we mainly review a number of representative domain adaptation methods based on the feature representation, which are some the most relevant to our proposed approach. This kind of methods can be further categorized into the data distribution centric and subspace centric methods.
Data distribution centric methods aim to explicitly minimize the distribution difference between the source and target data in a low-dimensional space and preserve the important properties of the original
Discriminative and informative joint distribution adaptation
In this section, we present the discriminative and informative joint distribution adaptation (DIJDA) in detail. After giving the descriptions of notations used in this paper, we formulate the DIJDA model as an optimization problem and give the solution to the model. At last, we analyze the convergence and computational complexity of the algorithm.
Experiments
In this section, we evaluate the effectiveness of the proposed method through extensive experiments on five benchmark datasets. The experimental data are first given, followed by the setup. Then, the experimental results compared with several state-of-the-art methods are discussed. Finally, the effectiveness, convergence property and parameter sensitivity of the model are analyzed carefully.
Conclusions and future works
In this paper, we have proposed a novel domain-invariant feature learning method based on the maximum margin criterion and row-sparsity technique, termed as discriminative and informative joint distribution adaptation (DIJDA), for unsupervised domain adaptation. The proposed DIJDA introduces the maximum margin criterion to maximize the margins of source samples from different classes and minimize those of samples from the same class at the same time in the adaptation process. Moreover, DIJDA
CRediT authorship contribution statement
Liran Yang: Conceptualization, Methodology, Software. Ping Zhong: Supervision, Writing- review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank the reviewers for their valuable comments and suggestions to improve the quality of this paper.
References (55)
- et al.
Low-resolution image categorization via heterogeneous domain adaptation
Knowl.-Based Syst.
(2019) - et al.
Deep object recognition across domains based on adaptive extreme learning machine
Neurocomputing
(2017) - et al.
Joint domain matching and classification for cross-domain adaptation via ELM
Neurocomputing
(2019) - et al.
Domain learning joint with semantic adaptation for human action recognition
Pattern Recognit.
(2019) - et al.
Sparse feature space representation: A unified framework for semi-supervised and domain adaptation learning
Knowl.-Based Syst.
(2018) - et al.
Semi-supervised transfer subspace for domain adaptation
Pattern Recognit.
(2018) - et al.
Semi-supervised domain adaptation via fredholm integral based kernel methods
Pattern Recognit.
(2019) - et al.
Soft large margin clustering for unsupervised domain adaptation
Knowl.-Based Syst.
(2020) - et al.
Joint metric and feature representation learning for unsupervised domain adaptation
Knowl.-Based Syst.
(2020) - et al.
Learning domain-shared group-sparse representation for unsupervised domain adaptation
Pattern Recognit.
(2018)
Transfer subspace learning via low-rank and discriminative reconstruction matrix
Knowl.-Based Syst.
Structure preservation and distribution alignment in discriminative transfer subspace learning
Neurocomputing
Transfer independently together: A generalized framework for domain adaptation
IEEE Trans. Cybern.
Robust visual knowledge transfer via extreme learning machine based domain adaptation
IEEE Trans. Image Process.
Blind domain adaptation with augmented extreme learning machine features
IEEE Trans. Cybern.
A survey on transfer learning
IEEE Trans. Knowl. Data Eng.
Discriminative transfer joint matching for domain adaptation in hyperspectral image classification
IEEE Geosci. Remote Sens. Lett.
Semi-supervised image-to-video adaptation for video action recognition
IEEE Trans. Cybern.
Unsupervised heterogeneous domain adaptation via shared fuzzy equivalence relations
IEEE Trans. Fuzzy Syst.
Domain adaptation problems: A DASVM classification technique and a circular validation strategy
IEEE Trans. Pattern Anal. Mach. Intell.
Adaptation regularization: A general framework for transfer learning
IEEE Trans. Knowl. Data Eng.
Cited by (30)
Locality Robust Domain Adaptation for cross-scene hyperspectral image classification
2024, Expert Systems with ApplicationsDomain adaptation with contrastive and adversarial oriented transferable semantic augmentation
2023, Knowledge-Based SystemsSelected confidence sample labeling for domain adaptation
2023, NeurocomputingA unified framework for visual domain adaptation with covariance matching
2023, Knowledge-Based SystemsDMDnet: A decoupled multi-scale discriminant model for cross-domain fish detection
2023, Biosystems EngineeringThe accuracy losing phenomenon in abrasive tool condition monitoring and a noval WMMC-JDA based data-driven method considered tool stochastic surface morphology
2023, Mechanical Systems and Signal Processing