Weakly Supervised Action Selection Learning in Video,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Weakly Supervised Action Selection Learning in Video
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02439
Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Guangwei Yu

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 10.3% and 5.7% relative improvement respectively. We further analyze the properties of ASL and demonstrate the importance of actionness. Full code for this work is available here: https://github.com/layer6ai-labs/ASL.

中文翻译：

视频中的弱监督动作选择学习

视频中的动作本地化是计算机视觉的核心任务。弱监督的时间定位问题研究了仅使用视频级标签是否可以充分解决此任务，从而显着减少了所需的昂贵且容易出错的注释的数量。一种常见的方法是训练帧级分类器，其中选择具有最高分类概率的帧以进行视频级预测。然后将帧级别的激活用于本地化。但是，缺少帧级注释会导致分类器在每个帧上赋予类偏差。为了解决这个问题，我们提出了“动作选择学习”（ASL）方法来捕获一般的动作概念，我们将这种特性称为“动作性”。在ASL下，该模型使用与类无关的新颖任务进行训练，以预测分类器将选择哪些帧。根据经验，我们显示ASL在两个流行的基准THUMOS-14和ActivityNet-1.2上均优于领先基准，相对改善分别为10.3％和5.7％。我们进一步分析了ASL的属性，并证明了行动性的重要性。这项工作的完整代码可在以下位置找到：https：//github.com/layer6ai-labs/ASL。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>