当前位置: X-MOL 学术Signal Image Video Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Video-based person-dependent and person-independent facial emotion recognition
Signal, Image and Video Processing ( IF 2.3 ) Pub Date : 2021-01-19 , DOI: 10.1007/s11760-020-01830-0
Noushin Hajarolasvadi , Enver Bashirov , Hasan Demirel

Facial emotion recognition is a challenging problem that has attracted the attention of researchers in the last decade. In this paper, we present a system for facial emotion recognition in video sequences. Then, we evaluate the system for a person-dependent and person-independent cases. Depending on the purpose of the designed system, the importance of training a personalized model versus a non-personalized one differs. In this paper, first, we compute 60 geometric features for video frames of two datasets, namely RML and SAVEE databases. In the next step, k -means clustering is applied to the geometric features to select k most discriminant frames for each video clip. Then, we employ various classifiers like linear support vector machine (SVM) and Gaussian SVM to find the best representative k . Finally, five pre-trained convolutional neural networks, namely VGG-16, VGG-19, ResNet-50, AlexNet, and GoogleNet, were used evaluating two scenarios: person-dependent and person-independent emotion recognition. Additionally, the effect of geometric features in keyframe selection for a person-dependent and person-independent scenarios is studied based on different regions of the face. Also, the extracted features by CNNs are visualized using the t -distributed stochastic neighbor embedding algorithm to study the discriminative ability in these scenarios. Experiments show that person-dependent systems result in higher accuracy and suitable to be used in personalized systems.

中文翻译:

基于视频的人依赖和人无关面部情绪识别

面部情绪识别是一个具有挑战性的问题,在过去十年中引起了研究人员的注意。在本文中,我们提出了一种用于视频序列中面部情绪识别的系统。然后,我们针对依赖于人的和独立于人的案例评估系统。根据设计系统的目的,训练个性化模型与非个性化模型的重要性不同。在本文中,首先,我们为两个数据集(即 RML 和 SAVEE 数据库)的视频帧计算 60 个几何特征。在下一步中,k-means 聚类应用于几何特征,为每个视频剪辑选择 k 个最具辨别力的帧。然后,我们采用线性支持向量机 (SVM) 和高斯 SVM 等各种分类器来找到最佳代表 k 。最后,五个预训练的卷积神经网络,即 VGG-16、VGG-19、ResNet-50、AlexNet 和 GoogleNet,用于评估两种场景:依赖于人的和独立于人的情绪识别。此外,基于人脸的不同区域,研究了几何特征在人依赖和人独立场景的关键帧选择中的影响。此外,CNN 提取的特征使用 t 分布随机邻域嵌入算法进行可视化,以研究这些场景中的判别能力。实验表明,依赖于人的系统具有更高的准确性,适合用于个性化系统。基于人脸的不同区域,研究了几何特征在人依赖和人独立场景的关键帧选择中的影响。此外,CNN 提取的特征使用 t 分布随机邻域嵌入算法进行可视化,以研究这些场景中的判别能力。实验表明,依赖于人的系统具有更高的准确性,适合用于个性化系统。基于人脸的不同区域,研究了几何特征在人依赖和人独立场景的关键帧选择中的影响。此外,CNN 提取的特征使用 t 分布随机邻域嵌入算法进行可视化,以研究这些场景中的判别能力。实验表明,依赖于人的系统具有更高的准确性,适合用于个性化系统。
更新日期:2021-01-19
down
wechat
bug