Recognizing Actions in Images by Fusing Multiple Body Structure Cues,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Recognizing Actions in Images by Fusing Multiple Body Structure Cues
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.patcog.2020.107341
Yang Li , Kan Li , Xinxin Wang

Abstract Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.

中文翻译：

通过融合多个身体结构线索来识别图像中的动作

摘要尽管卷积神经网络 (CNN) 在许多计算机视觉任务中取得了实质性的改进，但由于利用身体结构信息的能力有限，基于图像的动作识别仍有改进的空间。模型显式地探索身体结构信息并融合多个身体结构线索以实现图像中的鲁棒动作识别。为了充分探索身体结构信息，我们设计了身体结构探索子网络。它生成两种新颖的身体结构线索，结构体Parts 和 Limb Angle Descriptor，分别从全局和局部的角度捕捉人体的结构信息。然后，我们设计了动作分类子网络来融合来自多个身体结构线索的预测以获得精确的结果。此外，我们通过共享底部卷积层将两个子网络集成到一个统一的模型中，从而提高了训练和测试阶段的计算效率。我们在具有挑战性的基于图像的人类动作数据集 Pascal VOC 2012 Action 和 Stanford40 上全面评估了我们的网络。我们的方法分别达到了 93.5% 和 93.8% 的 mAP，优于该领域的所有最新方法。我们在具有挑战性的基于图像的人类动作数据集 Pascal VOC 2012 Action 和 Stanford40 上全面评估了我们的网络。我们的方法分别达到了 93.5% 和 93.8% 的 mAP，优于该领域的所有最新方法。我们在具有挑战性的基于图像的人类动作数据集 Pascal VOC 2012 Action 和 Stanford40 上全面评估了我们的网络。我们的方法分别达到了 93.5% 和 93.8% 的 mAP，优于该领域的所有最新方法。

更新日期：2020-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11