IETE Technical Review ( IF 2.5 ) Pub Date : 2020-09-08 , DOI: 10.1080/02564602.2020.1814168 Xiaoming Zhao 1 , Gang Chen 1, 2 , Yuelong Chuang 1 , Xin Tao 1 , Shiqing Zhang 1
Facial expression recognition from video sequences is currently an interesting research topic in computer vision, pattern recognition, artificial intelligence, etc. Considering the problem of semantic gap between the extracted hand-designed features in affective videos and subjective emotions, recognizing facial expressions from video sequences is a challenging subject. To tackle this problem, this paper proposes a new method of facial expression recognition from video sequences via deep residual attention network. Firstly, due to the difference in the intensity of emotional representation of each local area in a facial image, deep residual attention networks are employed to learn high-level affective expression features for each frame of facial expression images in video sequences. The used deep residual attention networks integrate deep residual networks with a spatial attention mechanism. Then, average-pooling is performed to produce fixed-length global video-level feature representations. Finally, the global video-level feature representations are utilized as inputs of a multi-layer perceptron to conduct facial expression classification tasks in video sequences. Experimental results on two public video emotional datasets, i.e. BAUM-1s and RML, demonstrate the effectiveness of the proposed method.
中文翻译:
通过深度残差注意力网络学习表情特征以从视频序列中识别面部表情
视频序列中的面部表情识别是目前计算机视觉、模式识别、人工智能等领域的一个有趣的研究课题. 考虑到情感视频中提取的手工设计特征与主观情绪之间的语义差异问题,从视频序列中识别面部表情是一个具有挑战性的课题。为了解决这个问题,本文提出了一种通过深度残差注意力网络从视频序列中识别面部表情的新方法。首先,由于面部图像中每个局部区域的情感表征强度的差异,采用深度残差注意力网络来学习视频序列中每一帧面部表情图像的高级情感表达特征。使用的深度残差注意力网络将深度残差网络与空间注意力机制相结合。然后,执行平均池化以产生固定长度的全局视频级特征表示。最后,全局视频级特征表示被用作多层感知器的输入,以在视频序列中执行面部表情分类任务。在两个公共视频情感数据集上的实验结果,即BAUM-1s 和 RML,证明了所提出方法的有效性。