Elsevier

Knowledge-Based Systems

Volume 207, 5 November 2020, 106394
Knowledge-Based Systems

Discriminative and informative joint distribution adaptation for unsupervised domain adaptation

https://doi.org/10.1016/j.knosys.2020.106394Get rights and content

Highlights

  • A novel feature learning method DIJDA is proposed for unsupervised domain adaptation.

  • A maximum margin criterion is used to preserve the separability of the samples.

  • A row-sparsity regularization is adopted to identify the informative features.

  • The validity of the proposed method is verified by extensive comparison experiments.

Abstract

Domain adaptation learning is proposed as an effective technology for leveraging rich supervision knowledge from the related domain(s) to learn a reliable classifier for a new domain. One popular kind of domain adaptation methods is based on feature representation. However, such methods fail to consider the within-class and between-class relations after obtaining the new representation. In addition, they do not consider the negative effects of features that might be redundant or irrelevant to the final classification. To this end, a novel domain-invariant feature learning method based on the maximum margin criterion and sparsity technique for unsupervised domain adaptation is proposed in this paper, referred to as discriminative and informative joint distribution adaptation (DIJDA). Specifically, DIJDA adopts the maximum margin criterion in the adaptation process such that the transformed samples are near to those in the same class but segregated from those in different classes. As a result, the discriminative knowledge referred from source labels can be transferred to target domain effectively. Moreover, DIJDA imposes a row-sparsity constraint on the transformation matrix, which enforces rows of the matrix corresponding to inessential feature attributes to be all zero. Therefore, the most informative feature attributes can be extracted. Compared with several state-of-the-art methods, DIJDA substantially improves the classification results on five widely used benchmark datasets, which demonstrates the effectiveness of the proposed method.

Introduction

The traditional supervised learning paradigms perform well under two assumptions: (1) there are enough labeled samples to guarantee the model to be well-trained; (2) the training and test data are subjected to the independent and identical distribution (I.I.D) [1]. However, these two assumptions do not always hold in many practical applications. On the one hand, labeled samples are generally scarce, especially in new domains as collecting samples and annotating them require the expensive and time-consuming human labor [2]. On the other hand, the new data cannot be guaranteed to have the same distribution as the already existed labeled ones due to various factors, such as the differences in resolution, illumination, and others [3]. For example, a face recognition system trained on high resolution laboratory images is sometimes applied to recognize low-resolution and noisy surveillance images. In such situations, the performance is inevitably degraded.

Although the distribution divergency may occur between the existing labeled and new target samples, some common factors are shared by these samples. Obviously, these labeled samples can provide some useful knowledge to improve the learning efficiency of the target data. If we learn the classifier through manually annotating those target data from scratch, it will take much time and cost. In other words, learning a well-performing model for the new data without using the information gained from the previously related labeled samples will be inefficient. How to effectively utilize these available labeled data to promote the learning of the newly unlabeled samples becomes an urgent problem. As a result, transfer learning has emerged as the solution with the objective of “borrowing” well-learned knowledge from the auxiliary data and applying it to the related target data. Depending on different situations between the source and target domains as well as tasks, transfer learning can be generally grouped into four categories: multi-task learning, self-taught learning, domain adaptation learning and unsupervised transfer learning [4].

As an active branch of transfer learning, domain adaptation learning has attracted increasing research attention in recent years and has been successfully applied on numerous applications, including but not limited to image classification [5], [6], video concept detection [7], [8], object recognition [9], [10], and action recognition [11], [12]. According to the survey [4], domain adaptation learning can be categorized into semi-supervised setting in which the target domain has very few labeled samples but sufficient unlabeled ones [13], [14], [15] and unsupervised setting in which the target data are fully unlabeled [16], [17], [18], [19]. Particularly, this paper focuses on unsupervised domain adaptation as the target samples without labels are often common in real-world applications. In consideration of feature spaces of the source domain and the target domain, the unsupervised domain adaptation can be further divided into homogeneous and heterogeneous domain adaptation. In the homogeneous domain adaptation problems, the feature spaces between the source and target domains are identical with the same dimension [17], [18], [19]. Hence, the source and target data are generally different in terms of distributions. While in the heterogeneous domain adaptation problems, the source and target domain data are characterized by different sets of features, and the dimensions may also generally differ [16], [20]. Many existing domain adaptation approaches focus on cross-domain learning problems of homogeneous features. Furthermore, according to the difference of the transferred contents, the homogeneous domain adaptation methods are divided into three subcategories [4]. The first category is instance-based adaptation, which is motivated by importance sampling. In this method, the divergence of distributions is mitigated by reweighting or selecting source samples in the light of their importance [21], [22]. The second type is model-based adaptation, which uses the source domain data to build a credible classifier such that the model parameters can be adapted to the target domain [23], [24]. And the last one is feature representation-based adaptation, which aims to learn a domain-invariant feature representation such that the distributions of both domains are the same or very similar [25], [26], [27]. Once the new feature representations of data are obtained, one can adopt any supervised learning algorithm for subsequent classification, regression, or clustering.

Nowadays, many scholars have shown their great interests in the feature representation type approach since it is computationally simple and can be generalized to out-of-sample patterns. Moreover, this type approach can flexibly introduce various meaningful regularizations to improve its interpretability and performance. In this paper, based on these admirable properties, we attempt to study feature representation adaptation for dealing with the inconsistency of the data distributions. By reviewing the existing feature based domain adaptation work, the motivations behind the proposed method are as follows: (1) These methods usually reduce the gap between the source and target domains by the way of feature transformation. However, samples from the different classes may be mixed together after transformation. This would lead to the degradation of the learning performance. Therefore, the transformed feature representation needs to be not only discriminative but also separable enough. (2) In many fields, data are normally represented by high-dimensional feature vectors, which inevitably brings in irrelevant information. The information may interfere with the classifier and decrease the classification accuracy. This naturally suggests us to identify the most representative feature attributes for transferring source knowledge to target more successfully.

With these motivations, the paper proposes a discriminative and informative joint distribution adaptation method based on the maximum margin criterion and row-sparsity technique (DIJDA for short). To be specific, DIJDA looks for a low-dimensional subspace shared by the involved domains where the distribution divergency across domains is effectively mitigated by maximum mean discrepancy (MMD) [28]. Meanwhile, DIJDA fully exploits the label information of the source domain such that features in this subspace are not only discriminative but also separable enough. Different from some previous domain adaptation methods that fit the input features to the corresponding labels, DIJDA encourages the transformed source samples with the same label to form a compact cluster, and different clusters with different labels to be far away from each other by means of the maximum margin criterion (MMC) [29]. That is to say, the distances of the transformed samples from the same class are decreased, and those of the transformed samples from different classes are increased. Moreover, transformed samples should be close to each other in the shared subspace if they are from the same class, regardless of which domain they originally belong to. To this end, DIJDA introduces a regularization term based on manifold assumption [30] to characterize the local structure of data such that the space relationship of samples in the low-dimensional space is preserved. Furthermore, to emphasize the roles of important features and lessen those of irrelevant features, DIJDA is enlightened by the sparsity technique [31] and imposes a row-sparsity constraint on the transformation matrix. This constraint forces entire rows of the transformation matrix corresponding to inessential features towards zero elements. Thus, DIJDA is capable of identifying informative feature attributes, which helps improve the learning performance.

In brief, the proposed method has the following advantages:

The proposed method introduces the maximum margin criterion into the domain adaptation learning, which makes the margins of samples from the same class be reduced while those of samples from different classes be enlarged. In this way, samples with the same label form a compact cluster, and different class clusters are as far away as possible.

The proposed method imposes a row-sparsity constraint on the transformation matrix, which shrinks the rows corresponding to inessential features in transformation matrix to zeros. Thus, the informative feature attributes are identified, and this naturally leads to the improvement of the learning performance.

The extensive experiments on five benchmark datasetsdemonstrate the superiority of the proposed method in comparison with three traditional machine learning methods and several state-of-the-art domain adaptation methods (including deep learning methods on Office-31and Office-Caltech-10 datasets).

The remainder of this paper is arranged as follows: Section 2 introduces some related work about domain adaptation. In Section 3, we present the formulation of the proposed model and its optimization algorithm, followed by the convergence and computational complexity of the algorithm. The extensive experiment results on several image benchmark datasets are presented and analyzed in Section 4. Finally, conclusions and future work are summarized in Section 5.

Section snippets

Related work

In this section, we mainly review a number of representative domain adaptation methods based on the feature representation, which are some the most relevant to our proposed approach. This kind of methods can be further categorized into the data distribution centric and subspace centric methods.

Data distribution centric methods aim to explicitly minimize the distribution difference between the source and target data in a low-dimensional space and preserve the important properties of the original

Discriminative and informative joint distribution adaptation

In this section, we present the discriminative and informative joint distribution adaptation (DIJDA) in detail. After giving the descriptions of notations used in this paper, we formulate the DIJDA model as an optimization problem and give the solution to the model. At last, we analyze the convergence and computational complexity of the algorithm.

Experiments

In this section, we evaluate the effectiveness of the proposed method through extensive experiments on five benchmark datasets. The experimental data are first given, followed by the setup. Then, the experimental results compared with several state-of-the-art methods are discussed. Finally, the effectiveness, convergence property and parameter sensitivity of the model are analyzed carefully.

Conclusions and future works

In this paper, we have proposed a novel domain-invariant feature learning method based on the maximum margin criterion and row-sparsity technique, termed as discriminative and informative joint distribution adaptation (DIJDA), for unsupervised domain adaptation. The proposed DIJDA introduces the maximum margin criterion to maximize the margins of source samples from different classes and minimize those of samples from the same class at the same time in the adaptation process. Moreover, DIJDA

CRediT authorship contribution statement

Liran Yang: Conceptualization, Methodology, Software. Ping Zhong: Supervision, Writing- review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank the reviewers for their valuable comments and suggestions to improve the quality of this paper.

References (55)

  • RazzaghiP. et al.

    Transfer subspace learning via low-rank and discriminative reconstruction matrix

    Knowl.-Based Syst.

    (2019)
  • XiaoT. et al.

    Structure preservation and distribution alignment in discriminative transfer subspace learning

    Neurocomputing

    (2019)
  • LiJ. et al.

    Transfer independently together: A generalized framework for domain adaptation

    IEEE Trans. Cybern.

    (2018)
  • ZhangL. et al.

    Robust visual knowledge transfer via extreme learning machine based domain adaptation

    IEEE Trans. Image Process.

    (2016)
  • UzairM. et al.

    Blind domain adaptation with augmented extreme learning machine features

    IEEE Trans. Cybern.

    (2017)
  • PanS.J. et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • PengJ. et al.

    Discriminative transfer joint matching for domain adaptation in hyperspectral image classification

    IEEE Geosci. Remote Sens. Lett.

    (2019)
  • J. Yang, R. Yan, A.G. Hauptmann, Cross-domain video concept detection using adaptive SVMs, in: Proc. ACM MM, 2007, pp....
  • L. Duan, I.W. Tsang, D. Xu, S.J. Maybank, Domain transfer SVM for video concept detection, in: Proc. IEEE CVPR, 2009,...
  • ZhangJ. et al.

    Semi-supervised image-to-video adaptation for video action recognition

    IEEE Trans. Cybern.

    (2017)
  • LiuF. et al.

    Unsupervised heterogeneous domain adaptation via shared fuzzy equivalence relations

    IEEE Trans. Fuzzy Syst.

    (2018)
  • Y. Cao, M. Long, J. Wang, Unsupervised domain adaptation with distribution matching machines, in: Proc. AAAI, Feb....
  • F. Liu, G. Zhang, J. Lu, Heterogeneous domain adaptation: An unsupervised approach, IEEE Trans. Neural Netw. Learn...
  • J. Huang, A.J. Smola, A. Gretton, K.M. Borgwardt, B. Schölkopf, Correcting sample selection bias by unlabeled data, in:...
  • B. Gong, K. Grauman, F. Sha, Connecting the dots with landmarks: discriminatively learning domain-invariant features...
  • BruzzoneL. et al.

    Domain adaptation problems: A DASVM classification technique and a circular validation strategy

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • LongM. et al.

    Adaptation regularization: A general framework for transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • Cited by (30)

    View all citing articles on Scopus
    View full text