当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Landmark-Aware and Part-based Ensemble Transfer Learning Network for Facial Expression Recognition from Static images
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-04-22 , DOI: arxiv-2104.11274
Rohan Wadhawan, Tapan K. Gandhi

Facial Expression Recognition from static images is a challenging problem in computer vision applications. Convolutional Neural Network (CNN), the state-of-the-art method for various computer vision tasks, has had limited success in predicting expressions from faces having extreme poses, illumination, and occlusion conditions. To mitigate this issue, CNNs are often accompanied by techniques like transfer, multi-task, or ensemble learning that often provide high accuracy at the cost of high computational complexity. In this work, we propose a Part-based Ensemble Transfer Learning network, which models how humans recognize facial expressions by correlating the spatial orientation pattern of the facial features with a specific expression. It consists of 5 sub-networks, in which each sub-network performs transfer learning from one of the five subsets of facial landmarks: eyebrows, eyes, nose, mouth, or jaw to expression classification. We test the proposed network on the CK+, JAFFE, and SFEW datasets, and it outperforms the benchmark for CK+ and JAFFE datasets by 0.51\% and 5.34\%, respectively. Additionally, it consists of a total of 1.65M model parameters and requires only 3.28 $\times$ $10^{6}$ FLOPS, which ensures computational efficiency for real-time deployment. Grad-CAM visualizations of our proposed ensemble highlight the complementary nature of its sub-networks, a key design parameter of an effective ensemble network. Lastly, cross-dataset evaluation results reveal that our proposed ensemble has a high generalization capacity. Our model trained on the SFEW Train dataset achieves an accuracy of 47.53\% on the CK+ dataset, which is higher than what it achieves on the SFEW Valid dataset.

中文翻译:

具有里程碑意义的基于部分的整体转移学习网络,用于从静态图像中识别人脸表情

在计算机视觉应用中,从静态图像识别面部表情是一个具有挑战性的问题。卷积神经网络(CNN)是用于各种计算机视觉任务的最先进方法,在从具有极端姿势,照明和遮挡条件的面部预测表情方面取得的成功有限。为了缓解此问题,CNN经常伴随着诸如传输,多任务或集成学习之类的技术,这些技术通常以高计算复杂性为代价提供高精度。在这项工作中,我们提出了一个基于部分的整体转移学习网络,该网络通过将面部特征的空间定向模式与特定表情相关联来对人类如何识别面部表情进行建模。它由5个子网组成,其中,每个子网都从面部标志的五个子集之一(眉毛,眼睛,鼻子,嘴巴或下巴)执行转移学习,然后进行表情分类。我们在CK +,JAFFE和SFEW数据集上测试了建议的网络,它的性能分别比CK +和JAFFE数据集的基准分别高0.51 \%和5.34 \%。此外,它总共包含165万个模型参数,仅需要3.28 $ \ times $ $ 10 ^ {6} $ FLOPS,从而确保了实时部署的计算效率。我们提出的集成体的Grad-CAM可视化效果突出了其子网的互补性,这是有效集成体网络的关键设计参数。最后,跨数据集评估结果表明,我们提出的集合具有很高的泛化能力。我们在SFEW训练数据集上训练的模型的准确度达到47。
更新日期:2021-04-26
down
wechat
bug