Guess where? Actor-supervision for spatiotemporal action localization,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Guess where? Actor-supervision for spatiotemporal action localization
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2019-12-09 , DOI: 10.1016/j.cviu.2019.102886
Victor Escorcia , Cuong D. Dao , Mihir Jain , Bernard Ghanem , Cees Snoek

This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a solution only requiring video class labels. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which are linked over time by Siamese similarity matching to account for actor deformations. Second, we propose an actor-based attention mechanism enabling localization from action class labels and actor proposals. It exploits a new actor pooling operation and is end-to-end trainable. Experiments on four action datasets show actor supervision is state-of-the-art for action localization from video class labels and is even competitive to some box-supervised alternatives.

中文翻译：

猜猜是哪儿？时空动作定位的演员监督

本文解决了视频中动作的时空定位问题。与领先的方法相比，所有方法都基于训练视频帧上经过仔细注释的框来进行本地化，相比之下，我们坚持只需要视频类别标签的解决方案。我们介绍了一种演员监督的体系结构，该体系结构根据演员转化来利用行为的固有组成来定位行为。我们做出两个贡献。首先，我们提出从提议用于图像的人类和非人类演员检测器中获得的演员提议，这些提议随着时间的推移通过暹罗相似匹配进行链接，以解决演员的变形。其次，我们提出了一个基于行为者的注意力机制，可以从动作类标签和行为者建议中进行本地化。它利用了新的演员集合操作，并且可以进行端到端的训练。

更新日期：2020-01-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11