当前位置: X-MOL 学术J. Multimodal User Interfaces › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-modal facial expression feature based on deep-neural networks
Journal on Multimodal User Interfaces ( IF 2.9 ) Pub Date : 2019-07-17 , DOI: 10.1007/s12193-019-00308-9
Wei Wei , Qingxuan Jia , Yongli Feng , Gang Chen , Ming Chu

Emotion recognition based on facial expression is a challenging research topic and has attracted a great deal of attention in the past few years. This paper presents a novel method, utilizing multi-modal strategy to extract emotion features from facial expression images. The basic idea is to combine the low-level empirical feature and the high-level self-learning feature into a multi-modal feature. The 2-dimensional coordinate of facial key points are extracted as low-level empirical feature and the high-level self-learning feature are extracted by the Convolutional Neural Networks (CNNs). To reduce the number of free parameters of CNNs, small filters are utilized for all convolutional layers. Owing to multiple small filters are equivalent of a large filter, which can reduce the number of parameters to learn effectively. And label-preserving transformation is used to enlarge the dataset artificially, in order to address the over-fitting and data imbalance of deep neural networks. Then, two kinds of modal features are fused linearly to form the facial expression feature. Extensive experiments are evaluated on the extended Cohn–Kanade (CK+) Dataset. For comparison, three kinds of feature vectors are adopted: low-level facial key point feature vector, high-level self-learning feature vector and multi-modal feature vector. The experiment results show that the multi-modal strategy can achieve encouraging recognition results compared to the single modal strategy.

中文翻译:

基于深度神经网络的多模式面部表情功能

基于面部表情的情绪识别是一个具有挑战性的研究主题,并且在过去几年中引起了极大的关注。本文提出了一种新的方法,利用多模式策略从面部表情图像中提取情感特征。基本思想是将低层次的经验特征和高层次的自学习特征组合成多模式特征。通过卷积神经网络(CNN)提取面部关键点的二维坐标作为低水平的经验特征,并提取高水平的自学习特征。为了减少CNN的自由参数的数量,所有卷积层都使用了小型滤波器。由于有多个小型过滤器,因此相当于大型过滤器,可以减少有效学习的参数数量。为了解决深度神经网络的过度拟合和数据不平衡问题,使用了保留标签的变换来人为地扩大数据集。然后,将两种模态特征线性融合以形成面部表情特征。在扩展的Cohn–Kanade(CK +)数据集上评估了广泛的实验。为了比较,采用了三种特征向量:低级面部关键点特征向量,高级自学习特征向量和多模式特征向量。实验结果表明,与单模式策略相比,多模式策略可以实现令人鼓舞的识别结果。两种模态特征被线性融合以形成面部表情特征。在扩展的Cohn–Kanade(CK +)数据集上评估了广泛的实验。为了比较,采用了三种特征向量:低级面部关键点特征向量,高级自学习特征向量和多模式特征向量。实验结果表明,与单模式策略相比,多模式策略可以实现令人鼓舞的识别结果。两种模态特征被线性融合以形成面部表情特征。在扩展的Cohn–Kanade(CK +)数据集上评估了广泛的实验。为了比较,采用了三种特征向量:低级面部关键点特征向量,高级自学习特征向量和多模式特征向量。实验结果表明,与单模式策略相比,多模式策略可以实现令人鼓舞的识别结果。
更新日期:2019-07-17
down
wechat
bug