Self Supervised Progressive Network for High Performance Video Object Segmentation.,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self Supervised Progressive Network for High Performance Video Object Segmentation.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2022-11-16 , DOI: 10.1109/tnnls.2022.3219936
Guorong Li ₁ , Dexiang Hong ₁ , Kai Xu ₁ , Bineng Zhong ₂ , Li Su ₁ , Zhenjun Han ₃ , Qingming Huang ₁

Affiliation

Recently, self-supervised video object segmentation (VOS) has attracted much interest. However, most proxy tasks are proposed to train only a single backbone, which relies on a point-to-point correspondence strategy to propagate masks through a video sequence. Due to its simple pipeline, the performance of the single backbone paradigm is still unsatisfactory. Instead of following the previous literature, we propose our self-supervised progressive network (SSPNet) which consists of a memory retrieval module (MRM) and collaborative refinement module (CRM). The MRM can perform point-to-point correspondence and produce a propagated coarse mask for a query frame through self-supervised pixel-level and frame-level similarity learning. The CRM, which is trained via cycle consistency region tracking, aggregates the reference & query information and learns the collaborative relationship among them implicitly to refine the coarse mask. Furthermore, to learn semantic knowledge from unlabeled data, we also design two novel mask-generation strategies to provide the training data with meaningful semantic information for the CRM. Extensive experiments conducted on DAVIS-17, YouTube-VOS and SegTrack v2 demonstrate that our method surpasses the state-of-the-art self-supervised methods and narrows the gap with the fully supervised methods.

中文翻译：

用于高性能视频对象分割的自监督渐进网络。

最近，自监督视频对象分割（VOS）引起了人们的极大兴趣。然而，大多数代理任务被提议只训练一个主干，它依赖于点对点对应策略通过视频序列传播掩码。由于其简单的流水线，单主干范式的性能仍然不能令人满意。我们没有遵循以前的文献，而是提出了自我监督的渐进式网络 (SSPNet)，它由记忆检索模块 (MRM) 和协作改进模块 (CRM) 组成。MRM 可以执行点对点对应，并通过自监督像素级和帧级相似性学习为查询帧生成传播的粗掩码。通过循环一致性区域跟踪训练的 CRM 汇总参考和查询信息并隐含地学习它们之间的协作关系以细化粗掩码。此外，为了从未标记的数据中学习语义知识，我们还设计了两种新颖的掩码生成策略，为 CRM 的训练数据提供有意义的语义信息。在 DAVIS-17、YouTube-VOS 和 SegTrack v2 上进行的大量实验表明，我们的方法超越了最先进的自监督方法，并缩小了与完全监督方法的差距。

更新日期：2022-11-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>