当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SAPS: Self-Attentive Pathway Search for weakly-supervised action localization with background-action augmentation
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-08-03 , DOI: 10.1016/j.cviu.2021.103256
Xiao-Yu Zhang 1 , Yaru Zhang 1, 2 , Haichao Shi 1, 2 , Jing Dong 3
Affiliation  

Weakly supervised temporal action localization is a challenging computer vision task, which aims to derive frame-level action identifier based on video-level supervision. Attention mechanism is a widely used paradigm for action recognition and localization in most recent methods. However, existing attention-based methods mostly focus on capturing the global dependency of the frame sequence regardless of the local inter-frame distances. Moreover, during background modeling, different background contents are typically classified into one category, which inevitably jeopardizes the discriminative ability of classifiers and brings about irrelevant noise. In this paper, we present a novel self-attentive pathway search framework, namely SAPS, to address the above challenges. To achieve comprehensive representation with discriminative attention weights, we design a NAS-based attentive module with a path-level searching process, and construct a competitive attention structure revealing both local and global dependency. Furthermore, we propose the action-related background modeling for robust background-action augmentation, where knowledge derived from background can provide informative clues for action recognition. An ensemble T-CAM operation is subsequently designed to incorporate background information to further refine the temporal action localization results. Extensive experiments on two benchmark datasets (i.e., THUMOS14 and ActivityNet1.2) have clearly corroborated the efficacy of our method.



中文翻译:

SAPS:通过背景动作增强搜索弱监督动作定位的自我注意路径

弱监督时间动作定位是一项具有挑战性的计算机视觉任务,旨在基于视频级监督推导出帧级动作标识符。注意机制是最近方法中广泛使用的动作识别和定位范式。然而,现有的基于注意力的方法主要集中在捕获帧序列的全局依赖性,而不管局部帧间距离。此外,在背景建模时,不同的背景内容通常被归为一类,这不可避免地危及分类器的判别能力并带来无关的噪声。在本文中,我们提出了一种新颖的自我注意路径搜索框架,即 SAPS,以解决上述挑战。为了实现具有判别力的注意力权重的综合表示,我们设计了一个具有路径级搜索过程的基于 NAS 的注意力模块,并构建了一个显示局部和全局依赖性的竞争注意力结构。此外,我们为稳健的背景动作增强提出了与动作相关的背景建模,其中来自背景的知识可以为动作识别提供信息线索。随后设计了一个集成的 T-CAM 操作以结合背景信息以进一步细化时间动作定位结果。在两个基准数据集(即 THUMOS14 和 ActivityNet1.2)上的大量实验清楚地证实了我们方法的有效性。我们为稳健的背景动作增强提出了与动作相关的背景建模,其中来自背景的知识可以为动作识别提供信息线索。随后设计集成 T-CAM 操作以结合背景信息以进一步细化时间动作定位结果。在两个基准数据集(即 THUMOS14 和 ActivityNet1.2)上的大量实验清楚地证实了我们方法的有效性。我们为稳健的背景动作增强提出了与动作相关的背景建模,其中来自背景的知识可以为动作识别提供信息线索。随后设计了一个集成的 T-CAM 操作以结合背景信息以进一步细化时间动作定位结果。在两个基准数据集(即 THUMOS14 和 ActivityNet1.2)上的大量实验清楚地证实了我们方法的有效性。

更新日期:2021-08-09
down
wechat
bug