EmNet: a deep integrated convolutional neural network for facial emotion recognition in the wild,Applied Intelligence

当前位置： X-MOL 学术 › Appl. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EmNet: a deep integrated convolutional neural network for facial emotion recognition in the wild
Applied Intelligence ( IF 5.3 ) Pub Date : 2021-01-10 , DOI: 10.1007/s10489-020-02125-0
Sumeet Saurav , Ravi Saini , Sanjay Singh

In the past decade, facial emotion recognition (FER) research saw tremendous progress, which led to the development of novel convolutional neural network (CNN) architectures for automatic recognition of facial emotions in static images. These networks, though, have achieved good recognition accuracy, they incur high computational costs and memory utilization. These issues restrict their deployment in real-world applications, which demands the FER systems to run on resource-constrained embedded devices in real-time. Thus, to alleviate these issues and to develop a robust and efficient method for automatic recognition of facial emotions in the wild with real-time performance, this paper presents a novel deep integrated CNN model, named EmNet (Emotion Network). The EmNet model consists of two structurally similar DCNN models and their integrated variant, jointly-optimized using a joint-optimization technique. For a given facial image, the EmNet gives three predictions, which are fused using two fusion schemes, namely average fusion and weighted maximum fusion, to obtain the final decision. To test the efficiency of the proposed FER pipeline on a resource-constrained embedded platform, we optimized the EmNet model and the face detector using TensorRT SDK and deploy the complete FER pipeline on the Nvidia Xavier device. Our proposed EmNet model with 4.80M parameters and 19.3MB model size attains notable improvement over the current state-of-the-art in terms of accuracy with multi-fold improvement in computational efficiency.

中文翻译：

EmNet：用于野外面部表情识别的深度集成卷积神经网络

在过去的十年中，面部表情识别（FER）研究取得了巨大进展，从而导致了用于在静态图像中自动识别面部表情的新型卷积神经网络（CNN）架构的发展。但是，这些网络已实现了良好的识别精度，但会导致较高的计算成本和内存利用率。这些问题限制了它们在实际应用中的部署，这要求FER系统在资源受限的嵌入式设备上实时运行。因此，为了缓解这些问题并开发一种鲁棒而高效的实时实时识别野外面部表情的方法，本文提出了一种新颖的深度集成CNN模型，名为EmNet（情感网络）。EmNet模型由两个结构相似的DCNN模型及其集成变体组成，使用联合优化技术对其进行联合优化。对于给定的面部图像，EmNet给出三个预测，使用两种融合方案（即平均融合和加权最大融合）将其融合，以获得最终决策。为了在资源受限的嵌入式平台上测试建议的FER管道的效率，我们使用TensorRT SDK优化了EmNet模型和面部检测器，并将完整的FER管道部署在Nvidia Xavier设备上。我们提出的具有4.80M参数和19.3MB模型大小的EmNet模型在准确性和计算效率的多方面改进方面均比当前的最新技术有了显着改进。对于给定的面部图像，EmNet给出三个预测，使用两种融合方案（即平均融合和加权最大融合）将其融合，以获得最终决策。为了在资源受限的嵌入式平台上测试建议的FER管道的效率，我们使用TensorRT SDK优化了EmNet模型和面部检测器，并将完整的FER管道部署在Nvidia Xavier设备上。我们提出的具有4.80M参数和19.3MB模型大小的EmNet模型在准确性和计算效率的多方面改进方面均比当前的最新技术有了显着改进。对于给定的面部图像，EmNet给出三个预测，使用两种融合方案（即平均融合和加权最大融合）将其融合，以获得最终决策。为了在资源受限的嵌入式平台上测试建议的FER管道的效率，我们使用TensorRT SDK优化了EmNet模型和面部检测器，并将完整的FER管道部署在Nvidia Xavier设备上。我们提出的具有4.80M参数和19.3MB模型大小的EmNet模型在准确性和计算效率的多方面改进方面均比当前的最新技术有了显着改进。为了在资源受限的嵌入式平台上测试建议的FER管道的效率，我们使用TensorRT SDK优化了EmNet模型和面部检测器，并将完整的FER管道部署在Nvidia Xavier设备上。我们提出的具有4.80M参数和19.3MB模型大小的EmNet模型在准确性和计算效率的多方面改进方面均比当前的最新技术有了显着改进。为了在资源受限的嵌入式平台上测试建议的FER管道的效率，我们使用TensorRT SDK优化了EmNet模型和面部检测器，并将完整的FER管道部署在Nvidia Xavier设备上。我们提出的具有4.80M参数和19.3MB模型大小的EmNet模型在准确性和计算效率的多方面改进方面均比当前的最新技术有了显着改进。

更新日期：2021-01-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>