Object-centric Video Prediction without Annotation,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Object-centric Video Prediction without Annotation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02799
Karl Schmeckpeper, Georgios Georgakis, Kostas Daniilidis

In order to interact with the world, agents must be able to predict the results of the world's dynamics. A natural approach to learn about these dynamics is through video prediction, as cameras are ubiquitous and powerful sensors. Direct pixel-to-pixel video prediction is difficult, does not take advantage of known priors, and does not provide an easy interface to utilize the learned dynamics. Object-centric video prediction offers a solution to these problems by taking advantage of the simple prior that the world is made of objects and by providing a more natural interface for control. However, existing object-centric video prediction pipelines require dense object annotations in training video sequences. In this work, we present Object-centric Prediction without Annotation (OPA), an object-centric video prediction method that takes advantage of priors from powerful computer vision models. We validate our method on a dataset comprised of video sequences of stacked objects falling, and demonstrate how to adapt a perception model in an environment through end-to-end video prediction training.

中文翻译：

无需注释的以对象为中心的视频预测

为了与世界互动，代理商必须能够预测世界动态的结果。了解这些动态特性的自然方法是通过视频预测，因为相机是无处不在且功能强大的传感器。直接的像素到像素视频预测很困难，没有利用已知的先验优势，也没有提供利用学习到的动态的简便界面。以对象为中心的视频预测通过利用世界是由对象组成的简单先验并提供更自然的控制界面来解决这些问题的。但是，现有的以对象为中心的视频预测管道需要训练视频序列中的密集对象注释。在这项工作中，我们提出了无注释的以对象为中心的预测（OPA），一种以对象为中心的视频预测方法，该方法利用了强大的计算机视觉模型的先验优势。我们在由掉落的堆叠物体的视频序列组成的数据集上验证了我们的方法，并演示了如何通过端到端视频预测训练来适应环境中的感知模型。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文