SCN: Dilated silhouette convolutional network for video action recognition,Computer Aided Geometric Design

当前位置： X-MOL 学术 › Comput. Aided Geom. Des. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SCN: Dilated silhouette convolutional network for video action recognition
Computer Aided Geometric Design ( IF 1.3 ) Pub Date : 2021-03-16 , DOI: 10.1016/j.cagd.2021.101965
Michelle Hua , Mingqi Gao , Zichun Zhong

Human action is a spatio-temporal motion sequence where strong inter-dependencies between the spatial geometry and temporal dynamics of motion exist. However, in existing literature for human action recognition from a video, there is a lack of synergy in investigating spatial geometry and temporal dynamics in a joint representation and embedding space. In this paper, we propose a dilated Silhouette Convolutional Network (SCN) for action recognition from a monocular video. We model the spatial geometric information of the moving human subject using silhouette boundary curves extracted from each frame of the motion video. The silhouette curves are stacked to form a 3D curve volume along the time axis and resampled to a 3D point cloud as a unified spatio-temporal representation of the video action. With the dilated silhouette convolution, the SCN is able to learn co-occurrence features from low-level geometric shape boundaries and their temporal dynamics jointly, and construct a unified convolutional embedding space, where the spatial and temporal properties are integrated effectively. The geometry-based SCN significantly improves the discrimination of learned features from the shape motions. Experiment results on the JHMDB, HMDB, and UCF101 datasets demonstrate the effectiveness and superiority of our proposed representation and deep learning method.

中文翻译：

SCN：用于视频动作识别的膨胀轮廓卷积网络

人类动作是一个时空运动序列，在空间时空运动与时间动态之间存在很强的相互依存关系。但是，在用于从视频中识别人类动作的现有文献中，在研究联合表示和嵌入空间中的空间几何形状和时间动态方面缺乏协同作用。在本文中，我们提出了一种扩展的剪影卷积网络（SCN），用于从单眼视频中识别动作。我们使用从运动视频的每一帧提取的轮廓边界曲线对运动的人类对象的空间几何信息进行建模。轮廓曲线沿时间轴堆叠以形成3D曲线体积，并重新采样到3D点云，作为视频动作的统一时空表示。借助膨胀的轮廓卷积，SCN能够从低层几何形状边界及其时间动态特征共同学习同现特征，并构建统一的卷积嵌入空间，从而有效地整合时空特性。基于几何的SCN大大改善了从形状运动中学习到的特征的辨别力。在JHMDB，HMDB和UCF101数据集上的实验结果证明了我们提出的表示和深度学习方法的有效性和优越性。基于几何的SCN大大改善了从形状运动中学习到的特征的辨别力。在JHMDB，HMDB和UCF101数据集上的实验结果证明了我们提出的表示和深度学习方法的有效性和优越性。基于几何的SCN大大改善了从形状运动中学习到的特征的辨别力。在JHMDB，HMDB和UCF101数据集上的实验结果证明了我们提出的表示和深度学习方法的有效性和优越性。

更新日期：2021-03-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11