当前位置: X-MOL 学术Comp. Visual Media › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EfficientPose: Efficient human pose estimation with neural architecture search
Computational Visual Media ( IF 6.9 ) Pub Date : 2021-04-07 , DOI: 10.1007/s41095-021-0214-z
Wenqiang Zhang , Jiemin Fang , Xinggang Wang , Wenyu Liu

Human pose estimation from image and video is a key task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on lightweight devices. Nowadays, real-time multimedia applications call for more efficient models for better interaction. Moreover, most deep neural networks for pose estimation directly reuse networks designed for image classification as the backbone, which are not optimized for the pose estimation task. In this paper, we propose an efficient framework for human pose estimation with two parts, an efficient backbone and an efficient head. By implementing a differentiable neural architecture search method, we customize the backbone network design for pose estimation, and reduce computational cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model requires only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model needs only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, HRNet, which takes 9.5 GFLOPs.



中文翻译:

EfficientPose:使用神经结构搜索进行有效的人体姿势估计

在许多多媒体应用中,根据图像和视频进行人体姿势估计是一项关键任务。先前的方法具有很高的性能,但很少考虑效率,这使得在轻型设备上实现网络变得困难。如今,实时多媒体应用程序需要更有效的模型以实现更好的交互。而且,大多数用于姿势估计的深层神经网络直接重用为图像分类而设计的网络作为主干,而这些网络并未针对姿势估计任务进行优化。在本文中,我们提出了一个有效的人体姿态估计框架,该框架包括两个部分:一个有效的骨架和一个有效的头部。通过实施可微分的神经体系结构搜索方法,我们可以自定义骨干网络设计以进行姿势估计,并降低了计算成本,而精度下降可忽略不计。对于有效率的头部,我们对转置的卷积进行了修整,并提出了空间信息校正模块以提高最终预测的性能。在实验中,我们根据MPII和COCO数据集评估我们的网络。我们最小的模型仅需0.65 GFLOP,MPII上PCKh@0.5即可达到88.1%,而我们的大型模型仅需2 GFLOP,而其精度却与最新的大型模型HRNet 9.5 GFLOP相比具有竞争力。

更新日期:2021-04-08
down
wechat
bug