RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization
The Visual Computer ( IF 3.0 ) Pub Date : 2020-10-30 , DOI: 10.1007/s00371-020-01934-1
Ziheng Zhang , Dongze Lian , Shenghua Gao

In this work, we utilize a multi-column CNNs framework to estimate the gaze point of a person sitting in front of a display from an RGB-D image of the person. Given that gaze points are determined by head poses, eyeball poses, and 3D eye positions, we propose to infer the three components separately and then integrate them for gaze point estimation. The captured depth images, however, usually contain noises and black holes which prevent us from acquiring reliable head pose and 3D eye position estimation. Therefore, we propose to refine the raw depth for 68 facial keypoints by first estimating their relative depths from RGB face images, which along with the captured raw depths are then used to solve the absolute depth for all facial keypoints through global optimization. The refined depths will provide us reliable estimation for both head pose and 3D eye position. Given that existing publicly available RGB-D gaze tracking datasets are small, we also build a new dataset for training and validating our method. To the best of our knowledge, it is the largest RGB-D gaze tracking dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the Eyediap dataset.

中文翻译：

基于 RGB-D 的注视点估计通过多列 CNN 和面部标志全局优化

在这项工作中，我们利用多列 CNN 框架从人的 RGB-D 图像估计坐在显示器前的人的注视点。鉴于凝视点是由头部姿势、眼球姿势和 3D 眼睛位置决定的，我们建议分别推断这三个分量，然后将它们整合起来进行凝视点估计。然而，捕获的深度图像通常包含噪声和黑洞，这使我们无法获得可靠的头部姿势和 3D 眼睛位置估计。因此，我们建议通过首先从 RGB 人脸图像中估计它们的相对深度来细化 68 个面部关键点的原始深度，然后将其与捕获的原始深度一起用于通过全局优化求解所有面部关键点的绝对深度。精细的深度将为我们提供对头部姿势和 3D 眼睛位置的可靠估计。鉴于现有的公开可用的 RGB-D 视线跟踪数据集很小，我们还构建了一个新的数据集来训练和验证我们的方法。据我们所知，就参与者数量而言，它是最大的 RGB-D 凝视跟踪数据集。综合实验表明，我们的方法在我们的数据集和 Eyediap 数据集上都大大优于现有方法。

更新日期：2020-10-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文