Video-Based Facial Expression Recognition using Deep Temporal–Spatial Networks,IETE Technical Review

当前位置： X-MOL 学术 › IETE Tech. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Video-Based Facial Expression Recognition using Deep Temporal–Spatial Networks
IETE Technical Review ( IF 2.4 ) Pub Date : 2019-07-25 , DOI: 10.1080/02564602.2019.1645620
Xianzhang Pan ₁ , Shiqing Zhang ₁ , WenPing Guo ₁ , Xiaoming Zhao ₁ , Yuelong Chuang ₁ , Ying Chen ₁ , Haibo Zhang ₁

Affiliation

It’s a challenging task to recognize facial expression in video sequences due to the gap between the hand-crafted features and the subjective emotions. To bridge the gap, this paper proposes a novel method of video-based facial expression recognition using deep temporal–spatial networks. The proposed method firstly employs multimodal deep convolutional neural networks (CNN), including the spatial CNN network and the temporal CNN network, to extract high-level spatial and temporal features in video sequences, respectively. The temporal–spatial CNN networks are fine-tuned on target video facial expression data from a pre-trained CNN model. Specially, the spatial network is used to learn deep spatial features from the static expression images in a video. Likewise, the temporal network is adopted to learn deep temporal features from the produced optical flow images between multiple frames in a video. Then the extracted spatial and temporal features are combined in a fusion network to conduct video-based facial expression classification tasks. Extensive experiments on two public video-based facial expression datasets, i.e. the BAUM-1s and RML database, demonstrate the promising performance of the proposed method, outperforming the-state-of-the-arts.

中文翻译：

使用深度时空网络的基于视频的面部表情识别

由于手工制作的特征和主观情绪之间的差距，识别视频序列中的面部表情是一项具有挑战性的任务。为了弥补这一差距，本文提出了一种使用深度时空网络的基于视频的面部表情识别的新方法。所提出的方法首先采用多模态深度卷积神经网络 (CNN)，包括空间 CNN 网络和时间 CNN 网络，分别提取视频序列中的高级空间和时间特征。时空 CNN 网络在来自预先训练的 CNN 模型的目标视频面部表情数据上进行了微调。特别地，空间网络用于从视频中的静态表达图像中学习深层空间特征。同样地，采用时间网络从视频中多帧之间产生的光流图像中学习深度时间特征。然后将提取的空间和时间特征组合在融合网络中以执行基于视频的面部表情分类任务。对两个基于视频的公共面部表情数据集（即 BAUM-1s 和 RML 数据库）的大量实验证明了所提出方法的有希望的性能，优于现有技术。

更新日期：2019-07-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>