A Two-Stage Spatiotemporal Attention Convolution Network for Continuous Dimensional Emotion Recognition From Facial Video,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Two-Stage Spatiotemporal Attention Convolution Network for Continuous Dimensional Emotion Recognition From Facial Video
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2021-03-03 , DOI: 10.1109/lsp.2021.3063609
Min Hu , Qian Chu , Xiaohua Wang , Lei He , Fuji Ren

Continuous dimensional emotion recognition for facial video sequence is a crucial and challenging task in Affective Computing and Human-Computer Intelligent Interaction. The key of this task is to effectively extract and discriminate spatial-temporal features in a more fine-grained way. In this paper, a Two-Stage Spatiotemporal Attention Temporal Convolution Network (TS-SATCN) is designed for continuous dimensional emotion recognition of facial videos. The first stage generates an initial recognition result that is later fed into the second for correction. In each stage, the introduced spatiotemporal attention branch helps the network learn different attention levels and focuses on the informative spatial-temporal features adaptively. The network is trained by a proposed smooth loss function which can further improve the predictions’ quality. Extensive experiments are performed on two datasets, RECOLA and AFEW-VA, which shows that the proposed method achieves significant improvement over state-of-the-art methods.

中文翻译：

用于面部视频连续维度情感识别的两阶段时空注意卷积网络

在情感计算和人机智能交互中，面部视频序列的连续维情感识别是一项至关重要且具有挑战性的任务。这项任务的关键是以更细粒度的方式有效地提取和区分时空特征。本文设计了一种两阶段时空注意时间卷积网络（TS-SATCN），用于面部视频的连续维度情感识别。第一级产生初始识别结果，随后将其馈入第二级进行校正。在每个阶段，引入的时空注意力分支都可以帮助网络学习不同的注意力水平，并自适应地关注信息量大的时空特征。该网络由建议的平滑损耗函数训练，该函数可以进一步提高预测的质量。

更新日期：2021-04-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>