当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Segment Moving Objects
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2018-09-22 , DOI: 10.1007/s11263-018-1122-2
Pavel Tokmakov , Cordelia Schmid , Karteek Alahari

We study the problem of segmenting moving objects in unconstrained videos. Given a video, the task is to segment all the objects that exhibit independent motion in at least one frame. We formulate this as a learning problem and design our framework with three cues: (1) independent object motion between a pair of frames, which complements object recognition, (2) object appearance, which helps to correct errors in motion estimation, and (3) temporal consistency, which imposes additional constraints on the segmentation. The framework is a two-stream neural network with an explicit memory module. The two streams encode appearance and motion cues in a video sequence respectively, while the memory module captures the evolution of objects over time, exploiting the temporal consistency. The motion stream is a convolutional neural network trained on synthetic videos to segment independently moving objects in the optical flow field. The module to build a “visual memory” in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. For every pixel in a frame of a test video, our approach assigns an object or background label based on the learned spatio-temporal features as well as the “visual memory” specific to the video. We evaluate our method extensively on three benchmarks, DAVIS, Freiburg-Berkeley motion segmentation dataset and SegTrack. In addition, we provide an extensive ablation study to investigate both the choice of the training data and the influence of each component in the proposed framework.

中文翻译:

学习分割移动对象

我们研究在无约束视频中分割移动对象的问题。给定一个视频,任务是分割所有在至少一帧中表现出独立运动的对象。我们将其表述为一个学习问题,并使用三个线索来设计我们的框架:(1)一对帧之间的独立物体运动,它补充了物体识别,(2)物体外观,这有助于纠正运动估计中的错误,以及(3 ) 时间一致性,这对分割施加了额外的约束。该框架是一个带有显式内存模块的双流神经网络。两个流分别对视频序列中的外观和运动线索进行编码,而内存模块利用时间一致性捕获对象随时间的演变。运动流是在合成视频上训练的卷积神经网络,用于分割光流场中独立运动的物体。在视频中构建“视觉记忆”的模块,即所有视频帧的联合表示,是通过从少量训练视频序列中学习到的卷积循环单元来实现的。对于测试视频帧中的每个像素,我们的方法根据学习的时空特征以及特定于视频的“视觉记忆”分配对象或背景标签。我们在三个基准测试中广泛评估了我们的方法,DAVIS、Freiburg-Berkeley 运动分割数据集和 SegTrack。此外,我们提供了广泛的消融研究,以调查训练数据的选择和拟议框架中每个组件的影响。
更新日期:2018-09-22
down
wechat
bug