Am I Done? Predicting Action Progress in Videos,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Am I Done? Predicting Action Progress in Videos
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2020-12-17 , DOI: 10.1145/3402447
Federico Becattini ₁ , Tiberio Uricchio ₁ , Lorenzo Seidenari ₁ , Lamberto Ballan ₂ , Alberto Del Bimbo ₁

Affiliation

In this article, we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task, since it can be valuable for a wide range of interaction applications. To this end, we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution. To provide a general definition of action progress, we ground our work in the linguistics literature, borrowing terms and concepts to understand which actions can be the subject of progress estimation. As a result, we define a categorization of actions and their phases. Motivated by the recent success obtained from the interaction of Convolutional and Recurrent Neural Networks, our model is based on a combination of the Faster R-CNN framework, to make framewise predictions, and LSTM networks, to estimate action progress through time. After introducing two evaluation protocols for the task at hand, we demonstrate the capability of our model to effectively predict action progress on the UCF-101 and J-HMDB datasets.

中文翻译：

我完成了吗？预测视频中的动作进度

在本文中，我们处理预测视频中动作进度的问题。我们认为这是一项极其重要的任务，因为它对于广泛的交互应用程序很有价值。为此，我们引入了一种名为 ProgressNet 的新方法，能够预测什么时候一个动作发生在视频中，在哪里它位于框架内，并且多远它在执行过程中取得了进展。为了提供行动进展的一般定义，我们以语言学文献为基础，借用术语和概念来理解哪些行动可以成为进展估计的主题。因此，我们定义了动作及其阶段的分类。受最近从卷积神经网络和循环神经网络的交互中获得的成功的启发，我们的模型基于 Faster R-CNN 框架的组合，用于进行逐帧预测，以及 LSTM 网络，以估计随时间的动作进度。在为手头的任务引入两个评估协议后，我们展示了我们的模型有效预测 UCF-101 和 J-HMDB 数据集上的动作进度的能力。

更新日期：2020-12-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文