MAIN: Multi-Attention Instance Network for video segmentation,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MAIN: Multi-Attention Instance Network for video segmentation
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-06-24 , DOI: 10.1016/j.cviu.2021.103240
Juan León Alcázar , María A. Bravo , Guillaume Jeanneret , Ali K. Thabet , Thomas Brox , Pablo Arbeláez , Bernard Ghanem

Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Network (MAIN), overcomes challenging segmentation scenarios over arbitrary videos without modeling sequence- or instance-specific knowledge. We design MAIN to segment multiple instances in a single forward pass, and optimize it with a novel loss function that favors class agnostic predictions and assigns instance-specific penalties. We achieve state-of-the-art performance on the challenging Youtube-VOS dataset and benchmark, improving the unseen Jaccard and F-Metric by 6.8% and 12.7% respectively, while operating at real-time (30.3 FPS).

中文翻译：

主要内容：用于视频分割的多注意实例网络

实例级视频分割需要空间和时间信息的可靠集成。然而，当前的方法主要依赖于特定领域的信息（在线学习）来产生准确的实例级分割。我们提出了一种新方法，它完全依赖于通用时空注意线索的整合。我们的策略，名为Multi-Attention Instance Network (MAIN)，克服了对任意视频具有挑战性的分割场景，而无需对序列或实例特定的知识进行建模。我们设计 MAIN 在单个前向传递中分割多个实例，并使用新的损失函数对其进行优化，该函数支持类不可知预测并分配特定于实例的惩罚。我们在具有挑战性的 Youtube-VOS 数据集和基准测试中实现了最先进的性能，将看不见的 Jaccard 和 F-Metric 分别提高了 6.8% 和 12.7%，同时实时运行 (30.3 FPS)。

更新日期：2021-07-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11