当前位置: X-MOL 学术IEEE Access › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Action Recognition using High Temporal resolution 3D Neural Network based on Dilated Convolution
IEEE Access ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/access.2020.3022407
Yongyang Xu , Yaxing Feng , Zhong Xie , Mingyu Xie , Wei Luo

3D Convolution Neural Networks (CNNs), an important deep learning model, has good performance in recognizing actions in videos. When recognizing actions from videos, 3D CNNs usually down-sample in temporal dimension, leading to loss of the temporal information. To obtain more temporal information from the videos, this work proposed a new model based on the Inflated 3D ConvNet (I3D), named as I3D-T. Instead of using down-sample in temporal dimension, the proposed model applied the dilated convolution in temporal dimension to enlarge the receptive field. At the same time, a non-local feature gating block was designed in the model to learn the correlations between different feature maps. The experimental results showed that the proposed I3D-T has the state-of-art performance. Using RGB frames as input, the action recognition accuracies are respectively 95% and 74.8% in public dataset of UCF101 and HMDB-51.

中文翻译:

使用基于扩张卷积的高时间分辨率 3D 神经网络的动作识别

3D 卷积神经网络 (CNN) 是一种重要的深度学习模型,在识别视频中的动作方面具有良好的性能。在从视频中识别动作时,3D CNN 通常在时间维度上进行下采样,从而导致时间信息的丢失。为了从视频中获得更多的时间信息,这项工作提出了一种基于 Inflated 3D ConvNet (I3D) 的新模型,命名为 I3D-T。所提出的模型不是在时间维度上使用下采样,而是在时间维度上应用扩张卷积来扩大感受野。同时,在模型中设计了一个非局部特征门控块来学习不同特征图之间的相关性。实验结果表明,所提出的 I3D-T 具有最先进的性能。使用 RGB 帧作为输入,
更新日期:2020-01-01
down
wechat
bug