当前位置: X-MOL 学术IEEE Trans. Circ. Syst. Video Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical Dynamic Programming Module for Human Pose Refinement
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.3 ) Pub Date : 4-4-2022 , DOI: 10.1109/tcsvt.2022.3164663
Chunyang Xie 1 , Dongheng Zhang 2 , Yang Hu 3 , Yan Chen 2
Affiliation  

We observed that remarkable and impressive performance on image-based human pose estimation have been achieved by deep Convolutional Neural Networks (CNN). Nevertheless, directly applying these image-based models on videos is not only computionally intensive, but also may cause jitter and loss. The main reason is that the image-based models purely focus on the local features of individual frames and totally ignore the temporal information among adjacent frames. Some existing methods are proposed to address the temporal coherency issue. However, these methods need to be designed carefully and cannot be combined with existing image-based methods. In this paper, we propose a simple yet effective module to refine the estimated pose by exploiting the temporal coherency among the heatmaps of adjacent frames, which can be easily inserted into image-based networks as a plug-in. We show that the temporal coherency issue among the heatmap frames could be re-formulated as a graph path selection optimization problem. Moreover, to speed up the refinement process, we propose a hierarchical graph optimization to achieve the refinement from coarse to fine. Experimental results on two large-scale video pose estimation benchmarks show that our module can improve the performance with little speed loss when combined with image-based methods as an efficient plug-in.

中文翻译:


用于人体姿势细化的分层动态编程模块



我们观察到深度卷积神经网络(CNN)在基于图像的人体姿态估计方面取得了令人印象深刻的卓越性能。然而,直接将这些基于图像的模型应用于视频不仅计算量大,而且可能会导致抖动和丢失。主要原因是基于图像的模型纯粹关注各个帧的局部特征,完全忽略相邻帧之间的时间信息。提出了一些现有的方法来解决时间一致性问题。然而,这些方法需要仔细设计,并且不能与现有的基于图像的方法结合。在本文中,我们提出了一个简单而有效的模块,通过利用相邻帧热图之间的时间一致性来细化估计姿势,该模块可以作为插件轻松插入到基于图像的网络中。我们表明,热图帧之间的时间一致性问题可以重新表述为图路径选择优化问题。此外,为了加快细化过程,我们提出了分层图优化来实现从粗到细的细化。两个大规模视频姿态估计基准的实验结果表明,当与基于图像的方法结合作为高效插件时,我们的模块可以在几乎没有速度损失的情况下提高性能。
更新日期:2024-08-26
down
wechat
bug