当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Divide-and-Conquer Strategy for Facial Landmark Detection using Dual-task CNN Architecture
Pattern Recognition ( IF 8 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.patcog.2020.107504
Rachida Hannane , Abdessamad Elboushaki , Karim Afdel

Abstract In this paper, we propose a novel deep learning-based framework for facial landmark detection. This framework takes as input face image returned by a face detector (Faster R-CNN) and generates as output a set of landmarks positions. Prior CNN-based methods often select randomly small local patches to predict an initial guess of landmarks locations. One issue with these local patches is that the adjacent landmarks might share the same regions due to the overlapping, thus, they might not convey precise information of each individual landmark. By contrast, our approach formulates this problem as a divide-conquer search for facial patches using CNN architecture in a hierarchy, where the input face image is recursively split into two cohesive non-overlapped subparts until each one contains only the region around the expected landmark. To attain better division of face topology, the search is carried out in a structured coarse-to-fine manner, where a learned hierarchical model of the face defining the granularity of each division level is introduced. We also propose a cascaded regressor to detect and refine the position of the individual landmark in each predicted non-overlapped patch. We adopt a carefully designed shallow CNN architecture so that to improve real-time performance. In addition, unlike previous cascaded methods, our regressor does not require auxiliary input such as initial landmarks locations. Extensive experiments on several challenging datasets (including MTFL, AFW, AFLW, COFW, 300W, and 300VW) show that our approach is particularly impressive in the unconstrained scenarios where it outperforms prior arts in both accuracy and efficiency.


使用双任务 CNN 架构进行面部地标检测的分而治之的策略

摘要在本文中,我们提出了一种新的基于深度学习的面部标志检测框架。该框架将人脸检测器(Faster R-CNN)返回的人脸图像作为输入,并生成一组地标位置作为输出。先前基于 CNN 的方法通常选择随机的小局部补丁来预测地标位置的初始猜测。这些局部块的一个问题是相邻地标可能由于重叠而共享相同的区域,因此,它们可能无法传达每个单独地标的精确信息。相比之下,我们的方法将此问题表述为使用层次结构中的 CNN 架构对面部补丁进行分而治之的搜索,其中输入的面部图像递归地分成两个有凝聚力的非重叠子部分,直到每个子部分仅包含预期地标周围的区域. 为了更好地划分人脸拓扑,搜索以结构化的从粗到细的方式进行,其中引入了定义每个划分级别粒度的人脸的学习分层模型。我们还提出了一个级联回归器来检测和改进每个预测的非重叠补丁中单个地标的位置。我们采用精心设计的浅层 CNN 架构,以提高实时性能。此外,与之前的级联方法不同,我们的回归器不需要辅助输入,例如初始地标位置。在几个具有挑战性的数据集(包括 MTFL、AFW、AFLW、COFW、300W 和 300VW)上进行的大量实验表明,我们的方法在无约束的场景中尤其令人印象深刻,它在准确性和效率方面都优于现有技术。