Depth as Attention for Face Representation Learning,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Depth as Attention for Face Representation Learning
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 1-21-2021 , DOI: 10.1109/tifs.2021.3053458
Hardik Uppal , Alireza Sepas-Moghaddam , Michael Greenspan , Ali Etemad

Face representation learning solutions have recently achieved great success for various applications such as verification and identification. However, face recognition approaches that are based purely on RGB images rely solely on intensity information, and therefore are more sensitive to facial variations, notably pose, occlusions, and environmental changes such as illumination and background. A novel depth-guided attention mechanism is proposed for deep multi-modal face recognition using low-cost RGB-D sensors. Our novel attention mechanism directs the deep network “where to look” for visual features in the RGB image by focusing the attention of the network using depth features extracted by a Convolution Neural Network (CNN). The depth features help the network focus on regions of the face in the RGB image that contain more prominent person-specific information. Our attention mechanism then uses this correlation to generate an attention map for RGB images from the depth features extracted by the CNN. We test our network on four public datasets, showing that the features obtained by our proposed solution yield better results on the Lock3DFace, CurtinFaces, IIIT-D RGB-D, and KaspAROV datasets which include challenging variations in pose, occlusion, illumination, expression, and time lapse. Our solution achieves average (increased) accuracies of 87.3% (+5.0%), 99.1% (+0.9%), 99.7% (+0.6%) and 95.3%(+0.5%) for the four datasets respectively, thereby improving the state-of-the-art. We also perform additional experiments with thermal images, instead of depth images, showing the high generalization ability of our solution when adopting other modalities for guiding the attention mechanism instead of depth information.

中文翻译：

深度作为人脸表示学习的注意力

人脸表示学习解决方案最近在验证和识别等各种应用中取得了巨大成功。然而，纯粹基于 RGB 图像的人脸识别方法仅依赖于强度信息，因此对面部变化更加敏感，特别是姿势、遮挡以及照明和背景等环境变化。提出了一种新颖的深度引导注意力机制，用于使用低成本 RGB-D 传感器进行深度多模态人脸识别。我们新颖的注意力机制通过使用卷积神经网络（CNN）提取的深度特征来集中网络的注意力，从而引导深度网络“去哪里寻找”RGB 图像中的视觉特征。深度特征帮助网络关注 RGB 图像中包含更突出的个人特定信息的面部区域。然后，我们的注意力机制使用这种相关性，根据 CNN 提取的深度特征生成 RGB 图像的注意力图。我们在四个公共数据集上测试了我们的网络，表明我们提出的解决方案获得的特征在 Lock3DFace、CurtinFaces、IIIT-D RGB-D 和 KaspAROV 数据集上产生了更好的结果，其中包括姿势、遮挡、照明、表达等方面具有挑战性的变化，和时间流逝。我们的解决方案对四个数据集分别实现了 87.3% (+5.0%)、99.1% (+0.9%)、99.7% (+0.6%) 和 95.3%(+0.5%) 的平均（提高）准确率，从而改善了状态最先进的。我们还使用热图像而不是深度图像进行了额外的实验，这表明我们的解决方案在采用其他方式代替深度信息来指导注意力机制时具有高泛化能力。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11