Dual-graph regularized discriminative transfer sparse coding for facial expression recognition

https://doi.org/10.1016/j.dsp.2020.102906Get rights and content

Abstract

Facial expression recognition has recently received an increasing attention due to its great potentiality in real world applications. Conventional facial expression recognition is often conducted on the assumption that training data and testing data are obtained from the same dataset. However, in reality, the data are often collected from different devices or environments, which will severely degrade the recognition performance. To tackle this problem, in this paper, we investigate the cross-dataset facial expression recognition problem, and propose a novel dual-graph regularized transfer sparse coding method (DGTSC). Specifically, aiming to reduce the distribution divergence of different databases while preserving the geometrical structures, we construct a dual-graph, by defining the inter-domain and intra-domain similarity, to measure the distance between different databases. Moreover, we further present a dual-graph regularized discriminative transfer sparse coding method (DGDTSC), which exploits the label information, to make our model has more discriminative power. Extensive experimental results and analysis on several facial expression datasets show the feasibility and effectiveness of the proposed methods.

Introduction

The goal of facial expression recognition is to recognize expressions from facial images. It has attracted increasing interests due to its applications in far-reaching fields, e.g., computer vision, multimedia entertainment, human-computer intelligent interactions and so on [1], [2]. Compared with other emotional expressions, the emotional states conveyed by facial expressions are more expressive. Thus, facial expression recognition has drawn more and more attentions. In [3], Ekman et al. have defined six basic facial expression categories, i.e., anger, disgust, fear, happiness, sadness and surprise, which are illustrated in Fig. 1. Most existing works focus on recognizing these six basic expressions, which are universal and recognizable across different cultures.

Overall, a facial expression recognition system can be roughly partitioned into two major parts, i.e., facial expression feature extraction and representation and facial expression classification. The main task of the first part is to extract a set of the facial expression features that are related with the expressions of humans, whereas the latter one is to determine the emotion categories based on the extracted facial expression features. For the former one, many facial expression extraction algorithms have been successfully proposed, e.g., scale-invariant feature transform (SIFT) [4], local binary patterns (LBP) [5], histograms of oriented gradients (HOG) [6], Gabor wavelet [7], facial movement features [8], features from salient facial patches [9], local binary pattern histograms from three orthogonal planes (LBP-TOP) [10] and features extracted using different deep learning algorithms [11]. For the latter one, all kinds of approaches have been proposed in the literature, e.g., support vector machine (SVM) [12], hidden Markov model (HMM) [13], AdaBoost [14] and deep neural networks [11], [15], [16]. These methods can achieve satisfying results in most facial expression recognition tasks.

In recent years, sparse coding has been successfully used in computer vision applications, like face recognition [17] [18], image classification [19], image restoration [20], etc. In computer vision, the high dimensionality of the feature vector is a tricky problem. Sparse coding can represent high-dimensional feature vectors as a linear combination of a small number of basis vectors and generate sparse representations. In this way, the high-dimensional feature vector can be represented by only a small number of effective coefficients, which is easy to be interpreted and can greatly reduce the computational costs of subsequent work. Recent studies [21], [22], [23] have shown that sparse coding is one of the successful representation models for facial expression recognition. For example, Tariq et al. [21] develop a generic sparse coding feature for non-frontal facial expression recognition. In [22], Jampour et al. present a multi-view facial expression recognition method by using a local linear regression of sparse codes. In [23], by using the label information, Chanti et al. have learned a discriminative dictionary for sparse representation to recognize the spontaneous facial expressions.

The above mentioned methods are mostly carried out on the assumption that the training and testing data are from the same dataset, i.e., they assume that the data are sampled from a common shared distribution [24]. However, this assumption does not hold, and would suffer a heavy drop in recognition performance. To address this problem, in recent years, with the development of transfer learning [25], [26], the cross-dataset facial expression recognition using transfer learning algorithms has become a hot research topic. In [27], Yan et al. propose a transfer subspace learning approach to learn a robust feature subspace for cross-dataset facial expression, which can transfer the knowledge gained from the source domain to the target domain to improve the recognition performance. In [28], Chu et al. introduce a selective transfer machine (STM) method for personalized facial expression analysis. By re-weighting the training samples, it can reduce the mismatch between the training and testing datasets. Zheng et al. [29] have proposed a transductive transfer regularized least-squares regression (TTRLSR) model to solve the cross-domain facial expression recognition problem. Note that for cross-domain problems, when we directly use the above mentioned sparse coding methods, the data sampled from different distributions may be quantized into different visual words of the codebook, and encoded with different representations [30], which makes the learned dictionary unable to effectively encode images and will greatly challenge the robustness of existing sparse coding algorithms for cross-dataset recognition problems. Thus, by exploiting the transfer learning techniques, Long et al. [30] develop a transfer sparse coding (TSC) method, which combines traditional sparse constraints with the distance measurement, i.e., maximum mean discrepancy (MMD) [31], [32]. However, TSC only simply introduces the MMD constraint into sparse coding and neglects the intra-domain and inter-domain graphs and discriminative label information, such information is important for classification.

In this paper, to deal with the challenging cross-dataset facial expression recognition problem, inspired by recent progress in sparse coding and transfer learning, we propose a novel dual-graph regularized discriminative transfer sparse coding (DGDTSC) method for robust facial expression recognition. The core idea of DGDTSC lies in seeking the robust transfer sparse codes to reduce the distribution divergence between two different databases. In this way, the source knowledge can be well adapted to facilitate the target expression recognition. Fig. 2 shows the diagram of our approach.

The major contributions of our work are summarized as follows:

  • Our algorithm provides a unified transfer learning framework, which elegantly combines sparse coding, dual-graph Laplacian regularization and discriminative regularization. Experimental results on cross-dataset facial expression recognition tasks show its superiority.

  • When learning sparse codes, we construct a dual-graph to explicitly measure the inter-domain and intra-domain similarity among different datasets, which can not only provide an effective guidance for similarity measurement of transfer learning, but also preserve the local geometric structural information of features.

  • We explicitly attempt to jointly take into account MMD and dual-graph as the distance metric. Therefore, the global and local distance measurement across domains can be well preserved, which can effectively reduce the distribution divergence between different datasets.

  • By utilizing the label information, we further introduce a discriminative constraint which aims to minimize the intra-class compactness and maximize the inter-class separability. Thus, the learned sparse representations can have more discriminative power.

  • A mathematical model of our proposed method is presented and an efficient optimization algorithm is applied to solve our model. We validate the results by comparing with state-of-the-art transfer learning algorithms. We further demonstrate the effectiveness of our method by a series of experiments.

The rest of this paper is organized as follows: We review the related works on sparse coding and transfer learning in Section 2. In Sections 3 and 4, we introduce our proposed algorithms, i.e. DGTSC and DGDTSC, as well as the optimization scheme, including learning sparse representations and dictionaries, respectively. The experimental results on cross-database facial expression recognition tasks are presented in Section 5. Finally, we conclude our work in Section 6.

Section snippets

Related work

In this section, we discuss the existing works which are the most related to our proposed method, including sparse coding and transfer learning, and show the incoherent relationship among these methods.

Proposed methods

In this section, we first present the notations utilized in this work. Then, we elaborate the details of our proposed methods.

Optimization algorithm

In this section, we discuss the optimization of DGDTSC. The optimization algorithm of problem (18) is divided into two iterative steps: 1) learning transfer sparse codes S with dictionary B fixed; and 2) learning dictionary B with transfer sparse codes S fixed.

Experiments

In this section, we conduct extensive experiments of cross-database facial expression recognition on publicly available facial expression datasets to evaluate our methods.

Conclusion

In this paper, we have proposed a novel transfer sparse coding approach, called dual-graph regularized discriminative transfer sparse coding (DGDTSC), for robust cross-dataset facial expression recognition. An important advantage of our method is constructing a dual-graph to preserve the inter-domain and intra-domain geometrical information, which can effectively reduce the distribution divergence between different datasets. Moreover, we encode the discriminative information into sparse coding,

CRediT authorship contribution statement

Dongliang Chen: Methodology, Experiments, Writing – original draft. Peng Song: Supervision, Methodology, Writing – review & editing, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant 61703360 and the Fundamental Research Funds for the Central Universities under Grant CDLS-2019-01.

Dongliang Chen received the B.S. degree in Computer Science from Yantai University, Yantai, China, in 2018. He is currently pursuing the M.S. degree in Computer Science at Yantai University. His current main research interests include affective computing and transfer learning.

References (67)

  • M. Dahmane et al.

    Emotion recognition using dynamic grid-based hog features

  • L. Zhang et al.

    Facial expression recognition using facial movement features

    IEEE Trans. Affect. Comput.

    (2011)
  • S. Happy et al.

    Automatic facial expression recognition using features of salient facial patches

    IEEE Trans. Affect. Comput.

    (2015)
  • G. Zhao et al.

    Dynamic texture recognition using local binary patterns with an application to facial expressions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • S. Li et al.

    Deep facial expression recognition: a survey

  • H. Khalifa et al.

    Facial expression recognition using svm classification on mic-macro patterns

  • P.S. Aleksic et al.

    Automatic facial expression recognition using facial animation parameters and multistream hmms

    IEEE Trans. Inf. Forensics Secur.

    (2006)
  • A. Majumder et al.

    Automatic facial expression recognition system using deep network-based data fusion

    IEEE Trans. Cybern.

    (2018)
  • C.-M. Kuo et al.

    A compact deep learning model for robust facial expression recognition

  • W. John et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • S. Gao et al.

    Local features are not lonely–Laplacian sparse coding for image classification

  • J. Mairal et al.

    Non-local sparse models for image restoration

  • U. Tariq et al.

    Multi-view facial expression recognition analysis with generic sparse coding feature

  • M. Jampour et al.

    Multi-view facial expressions recognition using local linear regression of sparse codes

  • D.A. Chanti et al.

    Spontaneous facial expression recognition using sparse representation

  • P. Song

    Transfer linear subspace learning for cross-corpus speech emotion recognition

    IEEE Trans. Affect. Comput.

    (2019)
  • S.J. Pan et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • L. Shao et al.

    Transfer learning for visual categorization: a survey

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • W.-S. Chu et al.

    Selective transfer machine for personalized facial expression analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • W. Zheng et al.

    Cross-domain color facial expression recognition using transductive transfer subspace learning

    IEEE Trans. Affect. Comput.

    (2018)
  • M. Long et al.

    Transfer sparse coding for robust image representation

  • A. Gretton et al.

    A kernel method for the two-sample-problem

  • S.J. Pan et al.

    Domain adaptation via transfer component analysis

    IEEE Trans. Neural Netw.

    (2011)
  • Cited by (8)

    View all citing articles on Scopus

    Dongliang Chen received the B.S. degree in Computer Science from Yantai University, Yantai, China, in 2018. He is currently pursuing the M.S. degree in Computer Science at Yantai University. His current main research interests include affective computing and transfer learning.

    Peng Song is currently an associate professor with the school of computer and control engineering, Yantai University, China. He received the B.S. degree in EE from Shandong University of Science and Technology, China in 2006, the M.E. and P.h.D degrees in EE both from Southeast University, China in 2009 and 2014, respectively. His current main research interests include affective computing and pattern recognition.

    View full text