Composite recurrent network with internal denoising for facial alignment in still and video images in the wild,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Composite recurrent network with internal denoising for facial alignment in still and video images in the wild
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-04-26 , DOI: 10.1016/j.imavis.2021.104189
Decky Aspandi , Oriol Martinez , Federico Sukno , Xavier Binefa

Facial alignment is an essential task for many higher level facial analysis applications, such as animation, human activity recognition and human - computer interaction. Although the recent availability of big datasets and powerful deep-learning approaches have enabled major improvements on the state of the art accuracy, the performance of current approaches can severely deteriorate when dealing with images in highly unconstrained conditions, which limits the real-life applicability of such models. In this paper, we propose a composite recurrent tracker with internal denoising that jointly address both single image facial alignment and deformable facial tracking in the wild. Specifically, we incorporate multilayer LSTMs to model temporal dependencies with variable length and introduce an internal denoiser which selectively enhances the input images to improve the robustness of our overall model. We achieve this by combining 4 different sub-networks that specialize in each of the key tasks that are required, namely face detection, bounding-box tracking, facial region validation and facial alignment with internal denoising. These blocks are endowed with novel algorithms resulting in a facial tracker that is both accurate, robust to in-the-wild settings and resilient against drifting. We demonstrate this by testing our model on 300-W and Menpo datasets for single image facial alignment, and 300-VW dataset for deformable facial tracking. Comparison against 20 other state of the art methods demonstrates the excellent performance of the proposed approach.

中文翻译：

具有内部去噪功能的复合递归网络，用于野外静止图像和视频图像中的人脸对齐

面部对齐是许多高级面部分析应用程序（例如动画，人类活动识别和人机交互）中的一项基本任务。尽管最近大数据集的可用性和强大的深度学习方法已使最新状态的准确性得到了重大改进，但是当在高度不受限制的条件下处理图像时，当前方法的性能可能会严重恶化，这限制了该方法在现实中的适用性。这样的模型。在本文中，我们提出了一种具有内部降噪功能的复合循环跟踪器，该跟踪器可共同解决野外单个图像面部对齐和可变形面部跟踪的问题。具体来说，我们采用多层LSTM来建模具有可变长度的时间依存关系，并引入了内部降噪器，该降噪器有选择地增强了输入图像，从而提高了整体模型的鲁棒性。我们通过将4个不同的子网进行组合来实现此目的，这些子网专门处理所需的每个关键任务，即面部检测，边界框跟踪，面部区域验证和带有内部降噪的面部对齐。这些模块具有新颖的算法，可产生准确，对野外设置稳定且对漂移具有弹性的面部跟踪器。我们通过在300-W和Menpo数据集上进行单图像面部对齐测试以及在300-VW数据集上进行可变形面部跟踪测试模型来证明这一点。

更新日期：2021-05-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11