Skip to main content
Log in

Efficient convolutional neural network with multi-kernel enhancement features for real-time facial expression recognition

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Facial expressions are the most direct external manifestation of personal emotions. Different from other pattern recognition problems, the feature difference between facial expressions is smaller. The general methods are difficult to effectively characterize the feature difference, or their parameters are too large to realize real-time processing. This paper proposes a lightweight mobile architecture and a multi-kernel feature facial expression recognition network, which can take into account the speed and accuracy of real-time facial expression recognition. First, a multi-kernel convolution block is designed by using three depthwise separable convolution kernels of different sizes in parallel. The small and the large kernels can extract local details and edge contour information of facial expressions, respectively. Then, the multi-channel information is fused to obtain multi-kernel enhancement features to better describe the differences between facial expressions. Second, a "Channel Split" operation is performed on the input of the multi-kernel convolution block, which can avoid repeated extraction of invalid information and reduce the amount of parameters to one-third of the original. Finally, a lightweight multi-kernel feature expression recognition network is designed by alternately using multi-kernel convolution blocks and depthwise separable convolutions to further improve the feature representation ability. Experimental results show that the proposed network achieves high accuracy of 73.3 and 99.5% on FER-2013 and CK + datasets, respectively. Furthermore, it achieves a speed of 78 frames per second on 640 × 480 video. It is superior to other state-of-the-art methods in terms of speed and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig.8

Similar content being viewed by others

References

  1. Ekman, P.: Contacts across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971)

    Article  Google Scholar 

  2. Alsubari, A., Satange, D.N., Ramteke, R.J.: Facial expression recognition using wavelet transform and local binary pattern. IEEE international conference for convergence in technology (I2CT), pp. 338–342 (2017)

  3. Kumar, P., Happy, S.L., Routray, A.: A real-time robust facial expression recognition system using HOG features. IEEE international conference on computing, analytics and security trends (CAST), pp. 289–293 (2016)

  4. Jing, L., Bo, Z.: Facial expression recognition based on Gabor and conditional random fields. IEEE international conference on signal processing (ICSP), pp. 752–756 (2017)

  5. Feng, Y., An, X., Liu, X.: The application of scale invariant feature transform fused with shape model in the human face recognition. IEEE advanced information management, communicates, electronic and automation control conference (IMCEC), pp. 1716–1720 (2016)

  6. Liliana, D.Y., Widyanto, M.R., Basaruddin, T.: Human emotion recognition based on active appearance model and semi-supervised fuzzy C-means. IEEE international conference on advanced computer science and information systems (ICACSIS), pp.1716–1720 (2016)s

  7. Wang, G.: Facial expression recognition method based on Zernike moments and MCE based HMM. IEEE international symposium on computational intelligence and design (ISCID), pp. 408–411 (2016)

  8. Lawi, A., Machrizzandi, M.S.R.: Facial expression recognition using multiclass ensemble least-square support vector machine. J. Phys. Conf. 979(1), 012032 (2018)

    Article  Google Scholar 

  9. Wu, Q., Qi, Z., Wang, Z., Zhang, Y.: An improved weighted local linear embedding algorithm. IEEE international conference on computational intelligence and security (CIS), pp. 378–381 (2018)

  10. Ruan, X., Wang, S.: Face recognition based on weighted multi-resolution kernel entropy component analysis. IEEE Chinese control and decision conference (CCDC), pp. 6118–6123 (2017)

  11. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  12. Lyons, M.J., Akamatsu, S., Kamachi, M.G., Gyoba, J.: Coding facial expressions with gabor wavelets. IEEE international conference on automatic face and gesture recognition (AFGR), pp. 200–205 (1998)

  13. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, L.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. IEEE computer society conference on computer vision and pattern recognition-workshops (CVPRW), pp. 94–101 (2010)

  14. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.-H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y: Challenges in representation learning: a report on three machine learning contests. International conference on neural information processing, pp. 117–124 (2013)

  15. Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. IEEE international conference on computer vision workshops (ICCVW), pp. 2106–2112 (2013)

  16. Tang, Y.: Deep learning using linear support vector machines (2015). arXiv:1306.0239

  17. Connie, T., Al-Shabi, M., Cheah, W.P., Goh, M.: Facial expression recognition using a hybrid CNN–SIFT aggregator. International workshop on multi- disciplinary trends in artificial intelligence (MIWAI), pp.139–149 (2017)

  18. Fang, H., Mac Parthaláin, N., Aubrey, A.J., Tam, G.K.L., Borgo, R., Rosin, P.L., Grant, P.W., Marshall, D., Chen, M.: Facial expression recognition in dynamic sequences: an integrated approach. Pattern Recogn. 47(3), 1271–1281 (2014)

    Article  Google Scholar 

  19. Jeon, J., Park, J.C., Jo, Y., Nam, C., Bae, K.H., Hwang, Y., Kim, D.S.: A real-time facial expression recognizer using deep neural network. ACM International Conference on Ubiquitous Information Management and Communication (IMCOM), pp. 1–4 (2016)

  20. Nehal, O., Noha, A., Fayez, W.: Intelligent real-time facial expression recognition from video sequences based on hybrid feature tracking algorithms. Int. J. Adv. Comput. Sci. Appl. 8(1), 245–260 (2017)

    Google Scholar 

  21. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. European conference on computer vision (ECCV), pp. 21–37 (2011)

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

  23. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W.J., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  24. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)

    Article  Google Scholar 

  25. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence (AAAI), pp. 4278–4284 (2017)

  26. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient cnn architecture design. European conference on computer vision (ECCV), pp.116–131 (2018)

  27. Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition?. IEEE international conference on computer Vision (ICCV), pp. 2146–2153 (2009)

  28. Sandler, M., Howard, A., Zhu, M.L., Zhmoginov, A., Chen, L.C.: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation (2018). arXiv preprint arXiv:1801.04381

  29. He, K.M., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. IEEE international conference on computer vision (ICCV), pp. 1026–1034(2015)

  30. Guo, Y., Tao, D., Yu, J., Xiong, H., Li Y.T, Tao D.C.: Deep neural networks with relativity learning for facial expression recognition. IEEE international conference on multimedia and expo workshops (ICMEW), pp. 1–6 (2016)

  31. Yan, J., Zheng, W., Cui, Z., Song, P.: A joint convolutional bidirectional lstm framework for facial expression recognition. IEICE Trans. Inform. Syst. 101(4), 1217–1220 (2018)

    Article  Google Scholar 

  32. Fernandez, P.D.M., Pena, F.A.G., Ren, T.I., Ren, T.L., Cunha, A.: FERAtt: facial expression recognition with attention net (2019). arXiv preprint arXiv:1902.03284

  33. Zhang, T., Zheng, W., et al.: Spatial-temporal recurrent neural network for emotion recognition. IEEE Trans. Cybern. (2019). https://doi.org/10.1109/TCYB.2017.2788081

    Article  Google Scholar 

  34. Gogić, I., Manhart, M., Pandžić, I.S., Ahlberg, J.: Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. 36(1), 97–112 (2020)

    Article  Google Scholar 

  35. Yu, M., Dong, H., Jihad Mohamad, A.J., El Saddik, A.: A deep learning system for recognizing facial expression in real-time. ACM Trans. Multimed. Comput. Commun. Appl. 15(2), 1–20 (2019)

    Google Scholar 

  36. Fernandez, P.D.M , Pea, F.A.G., Ren, T.I., et al. FERAtt: facial expression recognition with attention Net. IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 98–105 (2020)

  37. He, K.M., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778 (2016)

  38. Liew, S.S., Khalil, H.M., Bakhteri, R.: Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems. Neurocomputing 216(5), 718–734 (2016)

    Article  Google Scholar 

  39. Clevert, D.A., Unterthiner, T. Hocreiter, S.: Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) (2016). arXiv preprint arXiv:1511.07289

  40. Song, X., Bao, H.: Facial expression recognition based on video. IEEE applied imagery pattern recognition workshop (AIPR), pp. 1–5 (2017)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61771411; by the Sichuan Science and Technology Program under Grant No. 2019YJ0449.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Minze Li or Xiaoxia Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Li, X., Sun, W. et al. Efficient convolutional neural network with multi-kernel enhancement features for real-time facial expression recognition. J Real-Time Image Proc 18, 2111–2122 (2021). https://doi.org/10.1007/s11554-021-01088-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01088-w

Keywords

Navigation