Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 8-17-2018 , DOI: 10.1109/tpami.2018.2865794
Bo Xiong , Suyog Dutt Jain , Kristen Grauman

We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all “object-like” regions—even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream, and the network learns to combine both appearance and motion and attempts to extract all prominent objects whether they are moving or not. Beyond the core model, a second contribution of our approach is how it leverages varying strengths of training annotations. Pixel-level annotations are quite difficult to obtain, yet crucial for training a deep network approach for segmentation. Thus we propose ways to exploit weakly labeled data for learning dense foreground segmentation. For images, we show the value in mixing object category examples with image-level labels together with relatively few images with boundary-level annotations. For video, we show how to bootstrap weakly annotated videos together with the network trained for image segmentation. Through experiments on multiple challenging image and video segmentation benchmarks, our method offers consistently strong results and improves the state-of-the-art for fully automatic segmentation of generic (unseen) objects. In addition, we demonstrate how our approach benefits image retrieval and image retargeting, both of which flourish when given our high-quality foreground maps. Code, models, and videos are at: http://vision.cs.utexas.edu/projects/pixelobjectness/.

中文翻译：

像素对象性：学习自动分割图像和视频中的通用对象

我们提出了一个端到端的学习框架，用于分割图像和视频中的通用对象。给定一个新颖的图像或视频，我们的方法为所有“类似对象”区域生成像素级掩模，即使对于训练期间从未见过的对象类别也是如此。我们将该任务表述为一个结构化预测问题，即为每个像素分配对象/背景标签，并使用深度全卷积网络来实现。当应用于视频时，我们的模型进一步结合了运动流，并且网络学习将外观和运动结合起来，并尝试提取所有突出的对象，无论它们是否在移动。除了核心模型之外，我们方法的第二个贡献是它如何利用训练注释的不同优势。像素级注释很难获得，但对于训练深度网络分割方法至关重要。因此，我们提出了利用弱标记数据来学习密集前景分割的方法。对于图像，我们展示了将对象类别示例与图像级标签与相对较少的具有边界级注释的图像混合在一起的价值。对于视频，我们展示了如何引导弱注释视频以及经过图像分割训练的网络。通过对多个具有挑战性的图像和视频分割基准进行实验，我们的方法提供了一致的强大结果，并提高了通用（看不见的）对象的全自动分割的最先进水平。此外，我们还展示了我们的方法如何有利于图像检索和图像重定向，当我们提供高质量的前景图时，这两者都会蓬勃发展。代码、模型和视频位于：http://vision.cs.utexas.edu/projects/pixelobjectness/。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11