Fast pixel-matching for video object segmentation,Signal Processing: Image Communication

当前位置： X-MOL 学术 › Signal Process. Image Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast pixel-matching for video object segmentation
Signal Processing: Image Communication ( IF 3.4 ) Pub Date : 2021-07-22 , DOI: 10.1016/j.image.2021.116373
Siyue Yu ₁ , Jimin Xiao ₁ , Bingfeng Zhang ₁ , Eng Gee Lim ₁ , Yao Zhao ₂

Affiliation

Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.

中文翻译：

用于视频对象分割的快速像素匹配

视频对象分割，旨在根据第一帧的注释对前景对象进行分割，已引起越来越多的关注。许多最先进的方法依靠在线模型更新或掩码传播技术取得了出色的性能。然而，由于推理过程中的模型微调，大多数在线模型需要很高的计算成本。大多数基于掩码传播的模型速度更快，但由于无法适应对象外观变化而导致性能相对较低。在本文中，我们旨在设计一种新模型，以在速度和性能之间取得良好的平衡。我们提出了一种称为 NPMCA-net 的模型，它通过匹配参考帧和目标帧中的像素，基于掩码传播和非局部技术直接定位前景对象。由于我们引入了第一帧和前一帧的信息，因此我们的网络对大的对象外观变化具有鲁棒性，并且可以更好地适应遮挡。大量实验表明，我们的方法可以同时以较快的速度实现新的最先进性能（DAVIS-2016 上的 IoU 为 86.5%，DAVIS-2017 上的 IoU 为 72.2%，每帧速度为 0.11s ) 同级比较。源代码可在 https://github.com/siyueyu/NPMCA-net 获得。

更新日期：2021-07-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文