当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MAIN: Multi-Attention Instance Network for video segmentation
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-06-24 , DOI: 10.1016/j.cviu.2021.103240
Juan León Alcázar , María A. Bravo , Guillaume Jeanneret , Ali K. Thabet , Thomas Brox , Pablo Arbeláez , Bernard Ghanem

Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Network (MAIN), overcomes challenging segmentation scenarios over arbitrary videos without modeling sequence- or instance-specific knowledge. We design MAIN to segment multiple instances in a single forward pass, and optimize it with a novel loss function that favors class agnostic predictions and assigns instance-specific penalties. We achieve state-of-the-art performance on the challenging Youtube-VOS dataset and benchmark, improving the unseen Jaccard and F-Metric by 6.8% and 12.7% respectively, while operating at real-time (30.3 FPS).



中文翻译:

主要内容:用于视频分割的多注意实例网络

实例级视频分割需要空间和时间信息的可靠集成。然而,当前的方法主要依赖于特定领域的信息(在线学习)来产生准确的实例级分割。我们提出了一种新方法,它完全依赖于通用时空注意线索的整合。我们的策略,名为Multi-Attention Instance Network (MAIN),克服了对任意视频具有挑战性的分割场景,而无需对序列或实例特定的知识进行建模。我们设计 MAIN 在单个前向传递中分割多个实例,并使用新的损失函数对其进行优化,该函数支持类不可知预测并分配特定于实例的惩罚。我们在具有挑战性的 Youtube-VOS 数据集和基准测试中实现了最先进的性能,将看不见的 Jaccard 和 F-Metric 分别提高了 6.8% 和 12.7%,同时实时运行 (30.3 FPS)。

更新日期:2021-07-04
down
wechat
bug