Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition,Computational Intelligence and Neuroscience

当前位置： X-MOL 学术 › Comput. Intell. Neurosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition
Computational Intelligence and Neuroscience Pub Date : 2021-03-30 , DOI: 10.1155/2021/8890808
Zhengkui Weng ₁ , Zhipeng Jin ₁ , Shuangxi Chen ₁ , Quanquan Shen ₁ , Xiangyang Ren _{2,

3} , Wuzhao Li ₄

Affiliation

Convolutional neural network (CNN) has been leaping forward in recent years. However, the high dimensionality, rich human dynamic characteristics, and various kinds of background interference increase difficulty for traditional CNNs in capturing complicated motion data in videos. A novel framework named the attention-based temporal encoding network (ATEN) with background-independent motion mask (BIMM) is proposed to achieve video action recognition here. Initially, we introduce one motion segmenting approach on the basis of boundary prior by associating with the minimal geodesic distance inside a weighted graph that is not directed. Then, we propose one dynamic contrast segmenting strategic procedure for segmenting the object that moves within complicated environments. Subsequently, we build the BIMM for enhancing the object that moves based on the suppression of the not relevant background inside the respective frame. Furthermore, we design one long-range attention system inside ATEN, capable of effectively remedying the dependency of sophisticated actions that are not periodic in a long term based on the more automatic focus on the semantical vital frames other than the equal process for overall sampled frames. For this reason, the attention mechanism is capable of suppressing the temporal redundancy and highlighting the discriminative frames. Lastly, the framework is assessed by using HMDB51 and UCF101 datasets. As revealed from the experimentally achieved results, our ATEN with BIMM gains 94.5% and 70.6% accuracy, respectively, which outperforms a number of existing methods on both datasets.

中文翻译：

基于注意力的时间编码网络，具有背景无关的运动掩模，用于动作识别

卷积神经网络（CNN）近年来一直在飞速发展。然而，高维度、丰富的人体动态特征以及各种背景干扰增加了传统CNN捕获视频中复杂运动数据的难度。提出了一种名为基于注意力的时间编码网络（ATEN）和背景无关运动掩模（BIMM）的新颖框架来实现视频动作识别。最初，我们通过与无向加权图中的最小测地距离相关联，引入一种基于边界先验的运动分段方法。然后，我们提出了一种动态对比分割策略程序，用于分割在复杂环境中移动的对象。随后，我们构建了 BIMM，用于基于抑制相应帧内不相关背景来增强移动对象。此外，我们在ATEN内部设计了一种远程注意力系统，能够有效地纠正长期非周期性复杂动作的依赖性，基于对语义生命帧的更自动关注，而不是对整体采样帧的平等处理。因此，注意力机制能够抑制时间冗余并突出显示有区别的帧。最后，使用 HMDB51 和 UCF101 数据集对该框架进行评估。根据实验结果显示，我们的 ATEN 与 BIMM 分别获得了 94.5% 和 70.6% 的准确率，在两个数据集上均优于许多现有方法。

更新日期：2021-03-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11