Skip to main content
Log in

A discriminative deep association learning for facial expression recognition

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Deep learning based facial expression recognition becomes more successful in many applications. However, the lack of labeled data is still a bottleneck for better recognition performance. Thus, it is of practical significance to exploit the rich unlabeled data for training deep neural networks (DNNs). In this paper, we propose a novel discriminative deep association learning (DDAL) framework. The unlabeled data is provided to train the DNNs with the labeled data simultaneously, in a multi-loss deep network based on association learning. Moreover, the discrimination loss is also utilized to ensure intra-class clustering and inter-class centers separating. Furthermore, a large synthetic facial expression dataset is generated and used as unlabeled data. By exploiting association learning mechanism on two facial expression datasets, competitive results are obtained. By utilizing synthetic data, the performance is increased clearly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Wan M, Yang G, Gai S, Yang Z (2017) Two-dimensional discriminant locality preserving projections (2ddlpp) and its application to feature extraction via fuzzy set. Multimedia Tools Appl 76(1):355–371

    Article  Google Scholar 

  2. Wan M, Li M, Yang G, Gai S, Jin Z (2014) Feature extraction using two-dimensional maximum embedding difference. Inf Sci 274:55–69

    Article  Google Scholar 

  3. Wan M, Lai Z, Yang G, Yang Z, Zhang F, Zheng H (2017) Local graph embedding based on maximum margin criterion via fuzzy set. Fuzzy Sets Syst 318:120–131

    Article  MathSciNet  Google Scholar 

  4. Lai Z, Wong WK, Xu Y, Yang J, Zhang D (2015) Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 27(4):723–735

    Article  MathSciNet  Google Scholar 

  5. Lai Z, Xu Y, Chen Q, Yang J, Zhang D (2014) Multilinear sparse principal component analysis. IEEE Trans Neural Netw Learn Syst 25(10):1942–1950

    Article  Google Scholar 

  6. Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction ACM, pp 543–550

  7. Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction ACM, pp 503–510

  8. Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649

    Article  Google Scholar 

  9. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105

  10. Simonyan, K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  11. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  12. Bazrafkan S, Nedelcu T, Filipczuk P, Corcoran P (2017) Deep learning for facial expression recognition: a step closer to a smartphone that knows your moods. In: 2017 IEEE International Conference on Consumer Electronics (ICCE), pp 217–220

  13. Kaya H, Gürpınar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vision Comput 65:66–75

    Article  Google Scholar 

  14. Knyazev B, Shvetsov R, Efremova N, Kuharenko A (2017) Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598

  15. Ding H, Zhou SK, Chellappa R (2017) Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp 118–126

  16. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

  17. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434

  18. Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177

  19. Gao Y, Ma J, Yuille AL (2017) Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans Image Process 26(5):2545–2560

    Article  MathSciNet  Google Scholar 

  20. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  21. Lee DH (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p 2

  22. Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47

    Article  Google Scholar 

  23. Wen J, Fang X, Cui J, Fei L, Yan K, Chen Y, Xu Y (2018) Robust sparse linear discriminant analysis. IEEE Trans Circuits Syst Video Technol 29(2):390–403

    Article  Google Scholar 

  24. Roesch EB, Tamarit L, Reveret L, Grandjean D, Sander D, Scherer KR (2011) Facsgen: a tool to synthesize emotional facial expressions through systematic manipulation of facial action units. J Nonverbal Behav 35(1):1–16

    Article  Google Scholar 

  25. Ekman P, Rosenberg EL (1997) What the face reveals: basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, Oxford

    Google Scholar 

  26. Li J, Zhang D, Zhang J, Zhang J, Li T, Xia Y, Yan Q, Xun L (2017) Facial expression recognition with faster R-CNN. Procedia Comput Sci 107:135–140

    Article  Google Scholar 

  27. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  28. Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp 553–560

  29. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  30. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  31. Pons G, Masip D (2018) Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition. arXiv preprint arXiv:1802.06664

  32. Cohen I, Sebe N, Cozman FG, Huang TS (2003) Semi-supervised learning for facial expression recognition. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp 17–22

  33. Zhang Z, Ringeval F, Dong B, Coutinho E, Marchi E, Schüller B (2016) Enhanced semi-supervised learning for multimodal emotion recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5185–5189

  34. Du C, Du C, Li J, Zheng Wl, Lu Bl, He H (2017) Semi-supervised bayesian deep multi-modal emotion recognition. arXiv preprint arXiv:1704.07548

  35. Haeusser P, Mordvintsev A, Cremers D (2017) Learning by association–a versatile semi-supervised training method for neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 89–98

  36. Haeusser P, Frerix T, Mordvintsev A, Cremers D (2017) Associative domain adaptation. In: Proceedings of the IEEE Conference on International Conference on Computer Vision (ICCV), pp 2765–2773

  37. Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp 302–309

  38. Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognit Emotion 24(8):1377–1388

    Article  Google Scholar 

  39. Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619

    Article  Google Scholar 

  40. Krinidis S, Pitas I (2006) Facial expression synthesis through facial expressions statistical analysis. In: 2006 14th European Signal Processing Conference, pp 1–5

  41. Abbasnejad I, Sridharan S, Nguyen D, Denman S, Fookes C, Lucey S (2017) Using synthetic data to improve facial expression analysis with 3d convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1618

  42. Zhou Y, Shi BE (2017) Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp 370–376

  43. Kulkarni TD, Whitney WF, Kohli P, Tenenbaum J (2015) Deep convolutional inverse graphics network. In: Advances in Neural information processing systems, pp 2539–2547

  44. Dosovitskiy A, Tobias Springenberg J, Brox T (2015) Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1538–1546

  45. King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10(Jul):1755–1758

    Google Scholar 

  46. Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18

    Article  Google Scholar 

  47. Liu W, Zhang H, Tao D, Wang Y, Lu K (2016) Large-scale paralleled sparse principal component analysis. Multimedia Tools Appl 75(3):1481–1493

    Article  Google Scholar 

  48. Sun W, Zhao H, Jin Z (2017) An efficient unconstrained facial expression recognition algorithm based on stack binarized auto-encoders and binarized neural networks. Neurocomputing 267:385–395

    Article  Google Scholar 

  49. Moeini A, Moeini H (2015) Multimodal facial expression recognition based on 3D face reconstruction from 2D images. In: Face and facial expression recognition from real world videos, Springer, pp 46–57

  50. Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22

    Article  Google Scholar 

  51. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia, pp 357–360

  52. Cugu I, Sener E, AkbaS E (2017) Microexpnet: An extremely small and fast model for expression recognition from frontal face images. arXiv preprint arXiv:1711.07011

  53. Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, p 275-1

  54. Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2983–2991

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China under Grant Nos. 61872188, U1713208, 61602244, 61672287, 61702262, 61773215. Meanwhile, this work is partially supported by China Postdoctoral Science Foundation under Grant No.2018M643183.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Jin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, X., Sun, W. & Jin, Z. A discriminative deep association learning for facial expression recognition. Int. J. Mach. Learn. & Cyber. 11, 779–793 (2020). https://doi.org/10.1007/s13042-019-01024-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-01024-2

Keywords

Navigation