Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2018-07-10 , DOI: 10.1109/tpami.2018.2854726
Tianfan Xue , Jiajun Wu , Katherine L. Bouman , William T. Freeman

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.

中文翻译：

视觉动力学：通过分层交叉卷积网络的随机未来生成

我们研究了从单个输入图像合成许多可能的未来帧的问题。与以确定性或非参数方式解决此问题的传统方法相反，我们建议以概率方式对未来框架进行建模。我们的概率模型使我们有可能从单个输入图像中采样并合成许多可能的未来帧。为了合成物体的逼真的运动，我们提出了一种新颖的网络结构，即交叉卷积网络。该网络分别将图像和运动信息编码为特征图和卷积核。在实验中，我们的模型在合成数据（例如2D形状和动画游戏精灵）以及真实视频帧上的表现良好。我们提供对学习到的网络表示的分析，表示正在隐式学习对象外观和运动的紧凑编码。我们还演示了其一些应用，包括视觉类比制作和视频外推。

更新日期：2019-08-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>