当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hybrid Refinement-Correction Heatmaps for Human Pose Estimation
IEEE Transactions on Multimedia ( IF 7.3 ) Pub Date : 2020-06-03 , DOI: 10.1109/tmm.2020.2999181
Aouaidjia Kamel , Bin Sheng , Ping Li , Jinman Kim , David Dagan Feng

In this paper, we present a method (Hybrid-Pose) to improve human pose estimation in images. We adopt Stacked Hourglass Networks to design two convolutional neural network models, RNet for pose refinement and CNet for pose correction. The CNet (Correction Network) guides the pose refinement RNet (Refinement Network) to correct the joint location before generating the final pose. Each of the two models is composed of four hourglasses, and each hourglass generates a group of detection heatmaps for the joints. The RNet model hourglasses have the same structure. However, the CNet model is designed with hourglasses of different structures for pose guidance. Since the pose estimation in RGB images is very sensitive to the image scene, our proposed approach generates multiple outputs of detection heatmaps to broaden the searching scope for the correct joints locations. We use the RNet model to refine the joints locations in each hourglass stage horizontally, then the heatmaps of each stage are fused with the heatmaps of all the CNet model hourglasses vertically in a hybrid manner. Our method shows competitive results with the existing state-of-the-art approaches on MPII and FLIC benchmark datasets. Although our proposed method focuses on improving single-person pose estimation, we also show the influence of this improvement on multi-person pose estimation by detecting multiple people using SSD detector, then estimating the pose of each person individually.

中文翻译:

用于人体姿势估计的混合细化校正热图

在本文中,我们提出了一种改善图像中人体姿势估计的方法(Hybrid-Pose)。我们采用Stacked Hourglass网络来设计两个卷积神经网络模型,RNet用于姿势优化,而CNet用于姿势校正。CNet(校正网络)指导姿势细化RNet(精化网络)在生成最终姿势之前校正关节位置。这两个模型中的每个模型都由四个沙漏组成,并且每个沙漏都会为关节生成一组检测热图。RNet模型的沙漏具有相同的结构。但是,CNet模型是使用不同结构的沙漏设计的,用于进行姿势指导。由于RGB图像中的姿势估计对图像场景非常敏感,我们提出的方法可生成检测热图的多个输出,以扩大正确关节位置的搜索范围。我们使用RNet模型水平地细化每个沙漏阶段的关节位置,然后以混合的方式将每个阶段的热图与所有CNet模型沙漏的热图融合在一起。我们的方法与MPII和FLIC基准数据集上的现有最先进方法相比,显示出竞争性结果。尽管我们提出的方法着重于改进单人姿势估计,但我们也通过使用SSD检测器检测多个人,然后分别估计每个人的姿势,来展示这种改进对多人姿势估计的影响。然后将每个阶段的热图与所有CNet模型沙漏的热图以混合方式垂直融合。我们的方法与MPII和FLIC基准数据集上的现有最先进方法相比,显示出竞争性结果。尽管我们提出的方法着重于改进单人姿势估计,但我们也通过使用SSD检测器检测多个人,然后分别估计每个人的姿势,来展示这种改进对多人姿势估计的影响。然后将每个阶段的热图与所有CNet模型沙漏的热图以混合方式垂直融合。我们的方法与MPII和FLIC基准数据集上的现有最先进方法相比,显示出竞争性结果。尽管我们提出的方法着重于改进单人姿势估计,但我们也通过使用SSD检测器检测多个人,然后分别估计每个人的姿势,来展示这种改进对多人姿势估计的影响。
更新日期:2020-06-03
down
wechat
bug