当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attention Mechanism-based CNN for Facial Expression Recognition
Neurocomputing ( IF 6 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.neucom.2020.06.014
Jing Li , Kan Jin , Dalin Zhou , Naoyuki Kubota , Zhaojie Ju

Abstract Facial expression recognition is a hot research topic and can be applied in many computer vision fields, such as human–computer interaction, affective computing and so on. In this paper, we propose a novel end-to-end network with attention mechanism for automatic facial expression recognition. The new network architecture consists of four parts, i.e., the feature extraction module, the attention module, the reconstruction module and the classification module. The LBP features extract image texture information and then catch the small movements of the faces, which can improve the network performance. Attention mechanism can make the neural network pay more attention to useful features. We combine LBP features and attention mechanism to enhance the attention model to obtain better results. In addition, we collected and labelled a new facial expression dataset of seven expressions from 35 subjects aged from 20 to 25. For each subject, we captured both RGB images and depth images with a Microsoft Kinect sensor. For each image type, there are 245 image sequences, each of which contains 110 images, resulting in 26,950 images in total. We apply the newly proposed method to our own dataset and four representative expression datasets, i.e., JAFFE, CK+, FER2013 and Oulu-CASIA. The experimental results demonstrate the feasibility and effectiveness of the proposed method.

中文翻译:

基于注意力机制的 CNN 面部表情识别

摘要 面部表情识别是一个热门的研究课题,可以应用于许多计算机视觉领域,如人机交互、情感计算等。在本文中,我们提出了一种具有注意力机制的新型端到端网络,用于自动面部表情识别。新的网络架构由四部分组成,即特征提取模块、注意力模块、重构模块和分类模块。LBP 特征提取图像纹理信息,然后捕捉人脸的小动作,可以提高网络性能。注意机制可以使神经网络更加关注有用的特征。我们结合 LBP 特征和注意力机制来增强注意力模型以获得更好的结果。此外,我们收集并标记了一个新的面部表情数据集,该数据集包含 35 名年龄在 20 至 25 岁之间的受试者的七个表情。对于每个受试者,我们使用 Microsoft Kinect 传感器捕获了 RGB 图像和深度图像。对于每种图像类型,有 245 个图像序列,每个序列包含 110 个图像,总共产生 26,950 个图像。我们将新提出的方法应用于我们自己的数据集和四个具有代表性的表达数据集,即 JAFFE、CK+、FER2013 和 Oulu-CASIA。实验结果证明了该方法的可行性和有效性。共 950 张图片。我们将新提出的方法应用于我们自己的数据集和四个具有代表性的表达数据集,即 JAFFE、CK+、FER2013 和 Oulu-CASIA。实验结果证明了该方法的可行性和有效性。共 950 张图片。我们将新提出的方法应用于我们自己的数据集和四个具有代表性的表达数据集,即 JAFFE、CK+、FER2013 和 Oulu-CASIA。实验结果证明了该方法的可行性和有效性。
更新日期:2020-10-01
down
wechat
bug