当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2018-08-17 , DOI: 10.1109/tpami.2018.2865794
Bo Xiong , Suyog Dutt Jain , Kristen Grauman

We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions-even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream, and the network learns to combine both appearance and motion and attempts to extract all prominent objects whether they are moving or not. Beyond the core model, a second contribution of our approach is how it leverages varying strengths of training annotations. Pixel-level annotations are quite difficult to obtain, yet crucial for training a deep network approach for segmentation. Thus we propose ways to exploit weakly labeled data for learning dense foreground segmentation. For images, we show the value in mixing object category examples with image-level labels together with relatively few images with boundary-level annotations. For video, we show how to bootstrap weakly annotated videos together with the network trained for image segmentation. Through experiments on multiple challenging image and video segmentation benchmarks, our method offers consistently strong results and improves the state-of-the-art for fully automatic segmentation of generic (unseen) objects. In addition, we demonstrate how our approach benefits image retrieval and image retargeting, both of which flourish when given our high-quality foreground maps. Code, models, and videos are at: http://vision.cs.utexas.edu/projects/pixelobjectness/.

中文翻译:

像素客观性:学习在图像和视频中自动分割通用对象。

我们提出了一个端到端的学习框架,用于分割图像和视频中的通用对象。给定新颖的图像或视频,我们的方法为所有“类物体”区域(甚至对于训练过程中从未见过的物体类别)生成像素级蒙版。我们将任务表述为使用深度完全卷积网络实现的将对象/背景标签分配给每个像素的结构化预测问题。当应用于视频时,我们的模型进一步合并了运动流,并且网络学会了结合外观和运动,并尝试提取所有突出的对象,无论它们是否在移动。除了核心模型之外,我们方法的第二个贡献是它如何利用训练注解的各种优势。像素级注释很难获得,但对于训练深度网络细分方法至关重要。因此,我们提出了利用弱标记数据来学习密集前景分割的方法。对于图像,我们显示了在将对象类别示例与图像级别的标签以及相对较少的带有边界级别的注释的图像混合在一起时的价值。对于视频,我们展示了如何引导经过弱注释的视频以及经过训练以进行图像分割的网络。通过在多个具有挑战性的图像和视频分割基准上进行的实验,我们的方法提供了始终如一的强大结果,并改进了对常规(看不见的)对象进行全自动分割的最新技术。此外,我们演示了我们的方法如何使图像检索和图像重新定向受益,这两种方法在使用我们的高质量前景地图时都会蓬勃发展。代码,模型和视频位于:
更新日期:2019-10-23
down
wechat
bug