Spatial-temporal saliency action mask attention network for action recognition,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Spatial-temporal saliency action mask attention network for action recognition
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2020-06-26 , DOI: 10.1016/j.jvcir.2020.102846
Min Jiang , Na Pan , Jun Kong

Recently, video action recognition about two-stream network is still a popular research topic in computer vision. However, most of current two-stream-based methods have two redundancy issues, including: inter-frame redundancy and intra-frame redundancy. To solve the above problems, a Spatial-Temporal Saliency Action Mask Attention network (STSAMANet) is built for action recognition. First, this paper introduces a key-frame mechanism to eliminate inter-frame redundancy. This mechanism can compute key frames on each video sequence to get the greatest difference between frames. Then, Mask R-CNN detection technology is introduced to build a saliency attention layer to eliminate intra-frame redundancy. This layer is to focus on the saliency human body and objects for each action class. We experiment on two public video action datasets, i.e., the UCF101 dataset and Penn Action dataset to verify the effectiveness of our method in action recognition.

中文翻译：

时空显着性动作面具注意力网络用于动作识别

近来，关于两流网络的视频动作识别仍然是计算机视觉中的热门研究主题。但是，当前大多数基于两流的方法都有两个冗余问题，包括：帧间冗余和帧内冗余。为了解决上述问题，建立了时空显着性动作掩码注意网络（STSAMANet）用于动作识别。首先，本文介绍了一种消除帧间冗余的关键帧机制。这种机制可以计算每个视频序列上的关键帧，以使帧之间的差异最大。然后，引入Mask R-CNN检测技术，建立显着性关注层，消除帧内冗余。该层将重点放在每个动作类的显着人体和对象上。我们尝试了两个公共视频动作数据集，即

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>