LoRRaL: Facial Action Unit Detection Based on Local Region Relation Learning,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LoRRaL: Facial Action Unit Detection Based on Local Region Relation Learning
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10892
Ziqiang Shi and Liu Liu and Rujie Liu and Xiaoyu Mi and and Kentaro Murase

End-to-end convolution representation learning has been proved to be very effective in facial action unit (AU) detection. Considering the co-occurrence and mutual exclusion between facial AUs, in this paper, we propose convolution neural networks with Local Region Relation Learning (LoRRaL), which can combine latent relationships among AUs for an end-to-end approach to facial AU occurrence detection. LoRRaL consists of 1) use bi-directional long short-term memory (BiLSTM) to dynamically and sequentially encode local AU feature maps, 2) use self-attention mechanism to dynamically compute correspondences from local facial regions and to re-aggregate AU feature maps considering AU co-occurrences and mutual exclusions, 3) use a continuous-state modern Hopfield network to encode and map local facial features to more discriminative AU feature maps, that all these networks take the facial image as input and map it to AU occurrences. Our experiments on the challenging BP4D and DISFA Benchmarks without any external data or pre-trained models results in F1-scores of 63.5% and 61.4% respectively, which shows our proposed networks can lead to performance improvement on the AU detection task.

中文翻译：

LoRRaL：基于局部区域关系学习的面部动作单元检测

端到端卷积表示学习已被证明在面部动作单元 (AU) 检测中非常有效。考虑到面部 AU 之间的共现和互斥，在本文中，我们提出了具有局部区域关系学习 (LoRRaL) 的卷积神经网络，该网络可以结合 AU 之间的潜在关系，以实现面部 AU 出现检测的端到端方法. LoRRaL 包括 1) 使用双向长短期记忆 (BiLSTM) 动态和顺序编码局部 AU 特征图，2) 使用自注意力机制动态计算局部面部区域的对应关系并重新聚合 AU 特征图考虑到 AU 共现和互斥，3) 使用连续状态的现代 Hopfield 网络将局部面部特征编码和映射到更具辨别力的 AU 特征图，所有这些网络都将面部图像作为输入并将其映射到 AU 事件。我们在没有任何外部数据或预训练模型的情况下对具有挑战性的 BP4D 和 DISFA 基准进行的实验分别导致 F1 分数为 63.5% 和 61.4%，这表明我们提出的网络可以提高 AU 检测任务的性能。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文