当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2020-10-09 , DOI: 10.1109/tip.2020.3028207
Lei Shi , Yifan Zhang , Jian Cheng , Hanqing Lu

Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.

中文翻译:

多流自适应图卷积网络的基于骨架的动作识别

图卷积网络(GCN)将CNN泛化为更通用的非欧几里得结构,在基于骨骼的动作识别中取得了卓越的性能。但是,在以前的基于GCN的模型中仍然存在一些问题。首先,对图的拓扑进行启发式设置,并固定在所有模型层和输入数据上。这可能不适用于GCN模型的层次结构以及动作识别任务中数据的多样性。其次,很少研究骨骼数据的二阶信息,即骨骼的长度和方向,这自然对人类动作识别具有更多的信息和判别力。在这项工作中,我们提出了一种新颖的多流注意力增强自适应图卷积神经网络(MS-AAGCN),用于基于骨骼的动作识别。我们模型中的图拓扑可以基于输入数据以端到端的方式统一或单独学习。这种数据驱动的方法增加了用于图形构建的模型的灵活性,并为适应各种数据样本带来了更多的通用性。此外,通过时空通道注意模块进一步增强了所提出的自适应图卷积层,这有助于模型更加关注重要的关节,框架和特征。此外,在多流框架中同时对关节和骨骼的信息以及它们的运动信息进行了建模,这显示了识别准确性的显着提高。在两个大型数据集NTU-RGBD和Kinetics-Skeleton上进行了广泛的实验,
更新日期:2020-10-20
down
wechat
bug