Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-09-16 , DOI: 10.1007/s11263-021-01521-4
Haibo Jin ₁ , Shengcai Liao ₁ , Ling Shao _{1,

2}

Affiliation

Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains new state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.

中文翻译：

Pixel-in-Pixel Net：在野外实现高效的面部标志检测

最近，热图回归模型因其在定位面部标志方面的卓越性能而变得流行。然而，这些模型仍然存在三个主要问题：（1）计算量大；(2) 它们通常缺乏对全局形状的明确约束；(3) 领域差距普遍存在。为了解决这些问题，我们提出了用于面部标志检测的 Pixel-in-Pixel Net (PIPNet)。所提出的模型配备了基于热图回归的新型检测头，可同时对低分辨率特征图进行分数和偏移预测。通过这样做，不再需要重复的上采样层，从而在不牺牲模型精度的情况下大大减少推理时间。除了，提出了一个简单但有效的邻居回归模块，通过融合来自相邻地标的预测来强制执行局部约束，从而增强了新检测头的鲁棒性。为了进一步提高 PIPNet 的跨域泛化能力，我们提出了自学课程。这种训练策略能够从一个更简单的任务开始，然后逐渐增加提供更精确标签的难度，从而从跨域的未标记数据中挖掘出更可靠的伪标签。大量实验证明了 PIPNet 的优越性，它在监督设置下在六个流行基准中的三个上获得了最新的最新结果。与基线相比，两个跨域测试集的结果也不断改进。值得注意的是，我们的轻量级 PIPNet 版本运行速度为 35。CPU 和 GPU 上分别为 7 FPS 和 200 FPS，同时仍保持与最先进方法相比具有竞争力的准确性。PIPNet 的代码可在 https://github.com/jhb86253817/PIPNet 获得。

更新日期：2021-09-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>