Video Object Segmentation without Temporal Information,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Video Object Segmentation without Temporal Information
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 5-23-2018 , DOI: 10.1109/tpami.2018.2838670
K.-K. Maninis , S. Caelles , Y. Chen , J. Pont-Tuset , L. Leal-Taixe , D. Cremers , L. Van Gool

Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly. This paper explores the orthogonal approach of processing each frame independently, i.e., disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOSS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance-level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent single-object video segmentation databases, which show that OSVOSS is both the fastest and most accurate method in the state of the art. Experiments on multi-object video segmentation show that OSVOSS obtains competitive results.

中文翻译：

没有时间信息的视频对象分割

视频对象分割和一般视频处理在历史上一直由依赖连续视频帧中的时间一致性和冗余的方法主导。当时间平滑度突然被破坏时，例如当对象被遮挡或序列中某些帧丢失时，这些方法的结果可能会显着恶化。本文探讨了独立处理每一帧的正交方法，即忽略时间信息。特别是，它解决了半监督视频对象分割的任务：在给定第一帧中的掩码的情况下，将视频中的对象与背景分离。我们提出了基于全卷积神经网络架构的语义单次视频对象分割（OSVOSS），该架构能够将在 ImageNet 上学习的通用语义信息连续传输到前景分割任务，并最终学习物体的外观测试序列的单个带注释的对象（因此是一个镜头）。我们证明，实例级语义信息在有效组合时可以显着改善我们之前的方法 OSVOS 的结果。我们在两个最近的单对象视频分割数据库上进行了实验，结果表明 OSVOSS 是现有技术中最快且最准确的方法。多目标视频分割实验表明OSVOSS取得了有竞争力的结果。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11