Weakly supervised temporal action localization with actionness-guided false positive suppression,Neural Networks

当前位置： X-MOL 学术 › Neural Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Weakly supervised temporal action localization with actionness-guided false positive suppression
Neural Networks ( IF 7.8 ) Pub Date : 2024-04-15 , DOI: 10.1016/j.neunet.2024.106307
Zhilin Li , Zilei Wang , Qinying Liu

Weakly supervised temporal action localization aims to locate the temporal boundaries of action instances in untrimmed videos using video-level labels and assign them the corresponding action category. Generally, it is solved by a pipeline called “localization-by-classification”, which finds the action instances by classifying video snippets. However, since this approach optimizes the video-level classification objective, the generated activation sequences often suffer interference from class-related scenes, resulting in a large number of false positives in the prediction results. Many existing works treat background as an independent category, forcing models to learn to distinguish background snippets. However, under weakly supervised conditions, the background information is fuzzy and uncertain, making this method extremely difficult. To alleviate the impact of false positives, we propose a new actionness-guided false positive suppression framework. Our method seeks to suppress false positive backgrounds without introducing the background category. Firstly, we propose a self-training actionness branch to learn class-agnostic actionness, which can minimize the interference of class-related scene information by ignoring the video labels. Secondly, we propose a false positive suppression module to mine false positive snippets and suppress them. Finally, we introduce the foreground enhancement module, which guides the model to learn the foreground with the help of the attention mechanism as well as class-agnostic actionness. We conduct extensive experiments on three benchmarks (THUMOS14, ActivityNet1.2, and ActivityNet1.3). The results demonstrate the effectiveness of our method in suppressing false positives and it achieves the state-of-the-art performance. Code: .

中文翻译：

弱监督时间动作定位与动作引导的误报抑制

弱监督时间动作定位旨在使用视频级标签来定位未修剪视频中动作实例的时间边界，并为它们分配相应的动作类别。一般来说，它是通过称为“分类定位”的管道来解决的，该管道通过对视频片段进行分类来找到动作实例。然而，由于这种方法优化了视频级分类目标，生成的激活序列经常受到类别相关场景的干扰，导致预测结果出现大量误报。许多现有的作品将背景视为一个独立的类别，迫使模型学习区分背景片段。然而，在弱监督条件下，背景信息模糊且不确定，使得该方法极其困难。为了减轻误报的影响，我们提出了一种新的以行动为导向的误报抑制框架。我们的方法试图在不引入背景类别的情况下抑制误报背景。首先，我们提出了一个自训练动作分支来学习与类别无关的动作，它可以通过忽略视频标签来最小化与类别相关的场景信息的干扰。其次，我们提出了一个误报抑制模块来挖掘误报片段并抑制它们。最后，我们引入前景增强模块，该模块借助注意力机制和与类无关的动作来引导模型学习前景。我们对三个基准（THUMOS14、ActivityNet1.2 和 ActivityNet1.3）进行了广泛的实验。结果证明了我们的方法在抑制误报方面的有效性，并且达到了最先进的性能。代码：。

更新日期：2024-04-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>