当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recurrent Temporal Aggregation Framework for Deep Video Inpainting
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2019-12-11 , DOI: 10.1109/tpami.2019.2958083
Dahun Kim , Sanghyun Woo , Joon-Young Lee , In So Kweon

Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting. In particular, we construct an encoder-decoder model, where the encoder takes multiple reference frames which can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a recurrent feedback in an auto-regressive manner to enforce temporal consistency in the video results. We propose two architectural designs based on this framework. Our first model is a blind video decaptioning network (BVDNet) that is designed to automatically remove and inpaint text overlays in videos without any mask information. Our BVDNet wins the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track 2: Video Decaptioning. Second, we propose a network for more general video inpainting (VINet) to deal with more arbitrary and larger holes. Video results demonstrate the advantage of our framework compared to state-of-the-art methods both qualitatively and quantitatively. The codes are available at https://github.com/mcahny/Deep-Video-Inpainting , and https://github.com/shwoo93/video_decaptioning .

中文翻译:

深度视频修复的递归时间聚合框架

视频修补旨在用合理的内容填充视频中的时空漏洞。尽管在基于深度学习的单个图像修复方面取得了巨大的进步,但是由于额外的时间维度,将这些方法扩展到视频领域仍然是一个挑战。在本文中,我们提出了一种用于快速深度视频修复的循环时间聚合框架。特别地,我们构建了编码器-解码器模型,其中编码器采用多个参考帧,这些参考帧可以提供从场景动态中揭示的可见像素。这些提示被汇总并馈入解码器。我们以自动回归的方式应用循环反馈,以在视频结果中实现时间一致性。我们基于此框架提出了两种架构设计。我们的第一个模型是盲视频摘录网络(BVDNet),该网络旨在自动删除和修复视频中的文本叠加层,而无需任何遮罩信息。我们的BVDNet在ECCV Chalearn 2018 LAP喷漆比赛第二场:视频字幕制作中获得第一名。其次,我们提出了一个用于更通用的视频修复(VINet)的网络,以处理更多任意和更大的漏洞。视频结果从质量和数量上都证明了与最先进的方法相比,我们的框架的优势。可以在以下位置找到代码 我们提出了一个用于更通用的视频修复(VINet)的网络,以处理更多任意和更大的漏洞。视频结果从质量和数量上都证明了与最先进的方法相比,我们的框架的优势。可以在以下位置找到代码 我们提出了一个用于更通用的视频修复(VINet)的网络,以处理更多任意和更大的漏洞。视频结果从质量和数量上都证明了与最先进的方法相比,我们的框架的优势。可以在以下位置找到代码https://github.com/mcahny/Deep-Video-Inpaintinghttps://github.com/shwoo93/video_decaptioning
更新日期:2020-04-22
down
wechat
bug