当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep convolutional neural network-based Bernoulli heatmap for head pose estimation
Neurocomputing ( IF 5.5 ) Pub Date : 2021-01-19 , DOI: 10.1016/j.neucom.2021.01.048
Zhongxu Hu , Yang Xing , Chen Lv , Peng Hang , Jie Liu

Head pose estimation is a crucial problem for many tasks, such as driver attention, fatigue detection, and human behaviour analysis. It is well known that neural networks are better at handling classification problems than regression problems. It is an extremely nonlinear process to let the network output the angle value directly for optimization learning, and the weight constraint of the loss function will be relatively weak. This paper proposes a novel Bernoulli heatmap for head pose estimation from a single RGB image. Our method can achieve the positioning of the head area while estimating the angles of the head. The Bernoulli heatmap makes it possible to construct fully convolutional neural networks without fully connected layers and provides a new idea for the output form of head pose estimation. A deep convolutional neural network (CNN) structure with multiscale representations is adopted to maintain high-resolution information and low-resolution information in parallel. This kind of structure can maintain rich, high-resolution representations. In addition, channelwise fusion is adopted to make the fusion weights learnable instead of simple addition with equal weights. As a result, the estimation is spatially more precise and potentially more accurate. The effectiveness of the proposed method is empirically demonstrated by comparing it with other state-of-the-art methods on public datasets.



中文翻译:

基于深度卷积神经网络的伯努利热图用于头部姿态估计

头部姿势估计是许多任务的关键问题,例如驾驶员注意力,疲劳检测和人类行为分析。众所周知,神经网络在处理分类问题上比在回归问题上更好。让网络直接输出角度值以进行优化学习是一个极其非线性的过程,损失函数的权重约束将相对较弱。本文提出了一种新颖的伯努利热图,用于从单个RGB图像估计头部姿势。我们的方法可以在估计头部角度的同时实现头部区域的定位。伯努利热图使构建没有完全连接的层的完全卷积神经网络成为可能,并为头姿势估计的输出形式提供了新思路。采用具有多尺度表示的深度卷积神经网络(CNN)结构来并行维护高分辨率信息和低分辨率信息。这种结构可以保持丰富的高分辨率表示。另外,采用通道式融合来使融合权重成为可学习的,而不是使用相等权重的简单加法。结果,该估计在空间上更精确并且可能更精确。通过将其与公共数据集上的其他最新方法进行比较,经验地证明了该方法的有效性。采用通道方式融合可以使融合权重变为可学习的,而不是简单地添加相等权重的融合。结果,该估计在空间上更精确并且可能更精确。通过将其与公共数据集上的其他最新方法进行比较,经验地证明了该方法的有效性。采用通道方式融合可以使融合权重变为可学习的,而不是简单地添加相等权重的融合。结果,该估计在空间上更精确并且可能更精确。通过将其与公共数据集上的其他最新方法进行比较,经验地证明了该方法的有效性。

更新日期:2021-02-03
down
wechat
bug