当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VOSTR: Video Object Segmentation via Transferable Representations
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2020-02-03 , DOI: 10.1007/s11263-019-01224-x
Yi-Wen Chen , Yi-Hsuan Tsai , Yen-Yu Lin , Ming-Hsuan Yang

In order to learn video object segmentation models, conventional methods require a large amount of pixel-wise ground truth annotations. However, collecting such supervised data is time-consuming and labor-intensive. In this paper, we exploit existing annotations in source images and transfer such visual information to segment videos with unseen object categories. Without using any annotations in the target video, we propose a method to jointly mine useful segments and learn feature representations that better adapt to the target frames. The entire process is decomposed into three tasks: (1) refining the responses with fully-connected CRFs, (2) solving a submodular function for selecting object-like segments, and (3) learning a CNN model with a transferable module for adapting seen categories in the source domain to the unseen target video. We present an iterative update scheme between three tasks to self-learn the final solution for object segmentation. Experimental results on numerous benchmark datasets demonstrate that the proposed method performs favorably against the state-of-the-art algorithms.

中文翻译:

VOSTR:通过可传输表示进行视频对象分割

为了学习视频对象分割模型,传统方法需要大量逐像素的地面实况注释。然而,收集这样的监督数据既费时又费力。在本文中,我们利用源图像中现有的注释并将这些视觉信息传输到具有看不见的对象类别的视频中。在不使用目标视频中的任何注释的情况下,我们提出了一种联合挖掘有用片段并学习更好地适应目标帧的特征表示的方法。整个过程分解为三个任务:(1)用全连接的 CRF 细化响应,(2)求解用于选择类对象段的子模块函数,以及(3)学习具有可迁移模块的 CNN 模型以适应所见源域中的类别到看不见的目标视频。我们提出了三个任务之间的迭代更新方案,以自学对象分割的最终解决方案。在众多基准数据集上的实验结果表明,所提出的方法与最先进的算法相比表现良好。
更新日期:2020-02-03
down
wechat
bug