Action recognition in still images using a multi-attention guided network with weakly supervised saliency detection,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Action recognition in still images using a multi-attention guided network with weakly supervised saliency detection
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2021-07-31 , DOI: 10.1007/s11042-021-11215-1
Seyed Sajad Ashrafi ₁ , Shahriar B. Shokouhi ₁ , Ahmad Ayatollahi ₁

Affiliation

Action recognition in still images is an interesting subject in computer vision. One of the most important problems in still image-based action recognition is the lack of temporal information; At the same time, other existing problems such as cluttered backgrounds and diverse objects make the recognition task more challenging. However, there may be several salient regions in each action image, employing of which could lead to an improvement in the recognition performance. Moreover, since no unique and clear definition exists for detecting these salient regions in action recognition images, therefore, obtaining reliable ground truth salient regions is a highly challenging task. This paper presents a multi-attention guided network with weakly-supervised multiple salient regions detection for action recognition. A teacher-student structure is used to guide the attention of the student model into the salient regions. The teacher network with Salient Region Proposal (SRP) module generates weakly-supervised data for the student network in the training phase. The student network, with Multi-ATtention (MAT) module, proposes multiple salient regions and predicts the actions based on the found information in the evaluation phase. The proposed method obtains mean Average Precision (mAP) value of 94.2% and 93.80% on Stanford-40 Actions and PASCAL VOC2012 datasets, respectively. The experimental results, based on the ResNet-50 architecture, show the superiority of the proposed method compared to the existing ones on Stanford-40 and VOC2012 datasets. Also, we have made a major modification to the BU101 dataset which is now publicly available. The proposed method achieves mAP value of 90.16% on the new BU101 dataset.

中文翻译：

使用弱监督显着性检测的多注意力引导网络在静止图像中进行动作识别

静止图像中的动作识别是计算机视觉中的一个有趣主题。基于静止图像的动作识别中最重要的问题之一是缺乏时间信息。同时，其他存在的问题，如背景杂乱和物体多样，使识别任务更具挑战性。然而，每个动作图像中可能有几个显着区域，使用这些区域可以提高识别性能。此外，由于在动作识别图像中检测这些显着区域没有唯一明确的定义，因此，获得可靠的地面实况显着区域是一项极具挑战性的任务。本文提出了一种多注意引导网络，用于动作识别的弱监督多显着区域检测。使用师生结构将学生模型的注意力引导到显着区域。具有显着区域建议 (SRP) 模块的教师网络在训练阶段为学生网络生成弱监督数据。带有 Multi-ATtention (MAT) 模块的学生网络提出多个显着区域，并根据评估阶段找到的信息预测动作。所提出的方法在 Stanford-40 Actions 和 PASCAL VOC2012 数据集上分别获得了 94.2% 和 93.80% 的平均精度 (mAP) 值。基于 ResNet-50 架构的实验结果表明，与在 Stanford-40 和 VOC2012 数据集上的现有方法相比，所提出方法的优越性。此外，我们对现已公开的 BU101 数据集进行了重大修改。

更新日期：2021-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11