Orthogonalization-Guided Feature Fusion Network for Multimodal 2D+3D Facial Expression Recognition,IEEE Transactions on Multimedia

当前位置： X-MOL 学术 › IEEE Trans. Multimedia › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Orthogonalization-Guided Feature Fusion Network for Multimodal 2D+3D Facial Expression Recognition
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/tmm.2020.3001497
Shisong Lin , Mengchao Bai , Feng Liu , Linlin Shen , Yicong Zhou

As 2D and 3D data present different views of the same face, the features extracted from them can be both complementary and redundant. In this paper, we present a novel and efficient orthogonalization-guided feature fusion network, namely OGF2Net, to fuse the features extracted from 2D and 3D faces for facial expression recognition. While 2D texture maps are fed into a 2D feature extraction pipeline (FE2DNet), the attribute maps generated from 3D data are concatenated as input of the 3D feature extraction pipeline (FE3DNet). The two networks are separately trained at the first stage and frozen in the second stage for late feature fusion, which can well address the unavailability of a large number of 3D+2D face pairs. To reduce the redundancies among features extracted from 2D and 3D streams, we design an orthogonal loss-guided feature fusion network to orthogonalize the features before fusing them. Experimental results show that the proposed method significantly outperforms the state-of-the-art algorithms on both the BU-3DFE and Bosphorus databases. While accuracies as high as 89.05% (P1 protocol) and 89.07% (P2 protocol) are achieved on the BU-3DFE database, an accuracy of 89.28% is achieved on the Bosphorus database. The complexity analysis also suggests that our approach achieves a higher processing speed while simultaneously requiring lower memory costs.

中文翻译：

用于多模态 2D+3D 面部表情识别的正交化引导特征融合网络

由于 2D 和 3D 数据呈现同一张脸的不同视图，因此从它们中提取的特征既可以是互补的，也可以是冗余的。在本文中，我们提出了一种新颖且高效的正交化引导特征融合网络，即 OGF2Net，以融合从 2D 和 3D 面部提取的特征以进行面部表情识别。当 2D 纹理图被输入到 2D 特征提取管道 (FE2DNet) 时，从 3D 数据生成的属性图被连接起来作为 3D 特征提取管道 (FE3DNet) 的输入。两个网络在第一阶段分别训练，在第二阶段冻结用于后期特征融合，可以很好地解决大量3D+2D人脸对不可用的问题。为了减少从 2D 和 3D 流中提取的特征之间的冗余，我们设计了一个正交损失引导特征融合网络，在融合特征之前将特征正交化。实验结果表明，所提出的方法在 BU-3DFE 和博斯普鲁斯海峡数据库上均显着优于最先进的算法。虽然在 BU-3DFE 数据库上实现了高达 89.05%（P1 协议）和 89.07%（P2 协议）的准确率，但在 Bosphorus 数据库上实现了 89.28% 的准确率。复杂性分析还表明，我们的方法实现了更高的处理速度，同时需要更低的内存成本。在 BU-3DFE 数据库上实现了 07%（P2 协议），在 Bosphorus 数据库上实现了 89.28% 的准确度。复杂性分析还表明，我们的方法实现了更高的处理速度，同时需要更低的内存成本。在 BU-3DFE 数据库上实现了 07%（P2 协议），在 Bosphorus 数据库上实现了 89.28% 的准确率。复杂性分析还表明，我们的方法实现了更高的处理速度，同时需要更低的内存成本。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11