3D RANs: 3D Residual Attention Networks for action recognition,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

3D RANs: 3D Residual Attention Networks for action recognition
The Visual Computer ( IF 3.0 ) Pub Date : 2019-07-25 , DOI: 10.1007/s00371-019-01733-3
Jiahui Cai , Jianguo Hu

In this work, we propose 3D Residual Attention Networks (3D RANs) for action recognition, which can learn spatiotemporal representation from videos. The proposed network consists of attention mechanism and 3D ResNets architecture, and it can capture spatiotemporal information in an end-to-end manner. Specifically, we separately add the attention mechanism along channel and spatial domain to each block of 3D ResNets. For each sliced tensor of an intermediate feature map, we sequentially infer channel and spatial attention maps by channel and spatial attention mechanism submodules in each residual unit block, and the attention maps are multiplied to the input feature map to reweight the key features. We validate our network through extensive experiments in UCF-101, HMDB-51 and Kinetics datasets. Our experiments show that the proposed 3D RANs are superior to the state-of-the-art approaches for action recognition, demonstrating the effectiveness of our networks.

中文翻译：

3D RAN：用于动作识别的 3D 残差注意力网络

在这项工作中，我们提出了用于动作识别的 3D 残差注意力网络 (3D RAN)，它可以从视频中学习时空表示。所提出的网络由注意力机制和 3D ResNets 架构组成，它可以以端到端的方式捕获时空信息。具体来说，我们分别将沿着通道和空间域的注意力机制添加到 3D ResNet 的每个块中。对于中间特征图的每个切片张量，我们通过每个残差单元块中的通道和空间注意力机制子模块依次推断通道和空间注意力图，并将注意力图与输入特征图相乘以重新加权关键特征。我们通过 UCF-101、HMDB-51 和 Kinetics 数据集的大量实验验证了我们的网络。

更新日期：2019-07-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文