Multi-Stage Feature Fusion Network for Video Super-Resolution,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Stage Feature Fusion Network for Video Super-Resolution
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2021-02-09 , DOI: 10.1109/tip.2021.3056868
Huihui Song , Wenjie Xu , Dong Liu , Bo Liua , Qingshan Liub , Dimitris N. Metaxas

Video super-resolution (VSR) is to restore a photo-realistic high-resolution (HR) frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). An important step in VSR is to fuse the feature of the reference frame with the features of the supporting frames. The major issue with existing VSR methods is that the fusion is conducted in a one-stage manner, and the fused feature may deviate greatly from the visual information in the original LR reference frame. In this paper, we propose an end-to-end Multi-Stage Feature Fusion Network that fuses the temporally aligned features of the supporting frames and the spatial feature of the original reference frame at different stages of a feed-forward neural network architecture. In our network, the Temporal Alignment Branch is designed as an inter-frame temporal alignment module used to mitigate the misalignment between the supporting frames and the reference frame. Specifically, we apply the multi-scale dilated deformable convolution as the basic operation to generate temporally aligned features of the supporting frames. Afterwards, the Modulative Feature Fusion Branch, the other branch of our network accepts the temporally aligned feature map as a conditional input and modulates the feature of the reference frame at different stages of the branch backbone. This enables the feature of the reference frame to be referenced at each stage of the feature fusion process, leading to an enhanced feature from LR to HR. Experimental results on several benchmark datasets demonstrate that our proposed method can achieve state-of-the-art performance on VSR task.

中文翻译：

用于视频超分辨率的多阶段特征融合网络

视频超分辨率（VSR）用于从其对应的低分辨率（LR）帧（参考帧）和多个相邻帧（支持帧）中恢复照片般逼真的高分辨率（HR）帧。VSR的重要一步是将参考框架的特征与支撑框架的特征融合在一起。现有VSR方法的主要问题在于，融合是以一种阶段的方式进行的，并且融合后的特征可能会与原始LR参考系中的视觉信息有很大差异。在本文中，我们提出了一种端到端的多阶段特征融合网络，该网络融合了前馈神经网络体系结构不同阶段的支撑框架的时间对齐特征和原始参考框架的空间特征。在我们的网络中时间对准分支被设计为帧间时间对准模块，用于减轻支撑框架和参考框架之间的未对准。具体来说，我们将多尺度膨胀可变形卷积作为基本操作来生成支撑框架的时间对齐特征。此后，我们网络的另一个分支“调制特征融合”分支接受时间对齐的特征图作为条件输入，并在分支主干的不同阶段调制参考帧的特征。这使参考帧的特征在特征融合过程的每个阶段都能被参考，从而导致从LR到HR的增强特征。

更新日期：2021-02-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11