当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-stream slowFast graph convolutional networks for skeleton-based action recognition
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-02-23 , DOI: 10.1016/j.imavis.2021.104141
Ning Sun , Ling Leng , Jixin Liu , Guang Han

Recently, many efforts have been made to model spatial–temporal features from human skeleton for action recognition by using graph convolutional networks (GCN). Skeleton sequence can precisely represent human pose with a small number of joints while there is still a lot of redundancies across the skeleton sequence in the term of temporal dependency. In order to improve the effectiveness of spatial–temporal feature extraction from skeleton sequence, a SlowFast graph convolution network (SF-GCN) is proposed by implementing the architecture of SlowFast network, which is consisted of the Fast and Slow pathway, in the GCN model. The Fast pathway is a temporal attention embedded lightweight GCN for extracting the feature of fast temporal changes from the skeleton sequence with a high frame rate and fast refreshing speed. The Slow pathway is a spatial attention embedded GCN for extracting the feature of slow temporal changes from the skeleton sequence with a low frame rate and slow refreshing speed. The features of two pathways are fused by using lateral connection and weighted by using channel attention. Based on the aforementioned design, SF-GCN can achieve superior ability of feature extraction while the computational cost significantly drops. In addition to the coordinate information of joints, five high order sequences including edge, the spatial difference and temporal difference of joints and edges are induced to enhance the representation of human action. Six SF-GCNs are implemented for extracting spatial–temporal feature from six kinds of sequences and fused for skeleton-based action recognition, which is called multi-stream SlowFast graph convolutional networks (MSSF-GCN). Extensive experiments are conducted to evaluate the proposed method on three skeleton-based action recognition databases including NTU RGB + D, NTU RGB + D 120, and Skeleton-Kinetics. The results show that the proposed method is effective for skeleton-based action recognition and can achieve the recognition accuracy with an obvious advantage in comparison with the state-of-the-art.



中文翻译:

多流slowFast图卷积网络用于基于骨架的动作识别

最近,人们已经做出了很多努力,通过使用图卷积网络(GCN)对人体骨骼的时空特征进行建模以进行动作识别。骨架序列可以用少量关节精确地表示人的姿势,而就时序依赖性而言,骨架序列上仍有很多冗余。为了提高从骨架序列中提取时空特征的有效性,通过在GCN模型中实现由Fast和Slow路径组成的SlowFast网络架构,提出了SlowFast图卷积网络(SF-GCN)。 。快速通道是一种嵌入了时间注意的轻量级GCN,用于以高帧速率和快速刷新速度从骨骼序列中提取快速时间变化的特征。慢通道是一种嵌入空间注意的GCN,用于从帧序列中以低帧速率和缓慢刷新速度提取缓慢的时间变化特征。通过使用横向连接融合两个路径的特征,并通过使用通道注意权重。基于上述设计,SF-GCN可以实现卓越的特征提取能力,而计算成本却大大降低。除了关节的坐标信息外,还诱发了包括边缘,关节和边缘的空间差异和时间差异在内的五个高阶序列,以增强人类动作的表现力。实施了六个SF-GCN,以从六种序列中提取时空特征并融合以进行基于骨骼的动作识别,这就是所谓的多流SlowFast图卷积网络(MSSF-GCN)。进行了广泛的实验,以在三个基于骨骼的动作识别数据库(包括NTU RGB + D,NTU RGB + D 120和Skeleton-Kinetics)上评估该方法。结果表明,所提出的方法对于基于骨骼的动作识别是有效的,并且与现有技术相比具有明显的优势。

更新日期:2021-03-04
down
wechat
bug