Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution,Cognitive Computation

当前位置： X-MOL 学术 › Cognit. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution
Cognitive Computation ( IF 4.3 ) Pub Date : 2021-05-15 , DOI: 10.1007/s12559-021-09874-1
Yuezhong Chu , Yunan Qiao , Heng Liu , Jungong Han

By selectively enhancing the features extracted from convolution networks, the attention mechanism has shown its effectiveness for low-level visual tasks, especially for image super-resolution (SR). However, due to the spatiotemporal continuity of video sequences, simply applying image attention to a video does not seem to obtain good SR results. At present, there is still a lack of suitable attention structure to achieve efficient video SR. In this work, building upon the dual attention, i.e., position attention and channel attention, we proposed deep dual attention, underpinned by self-attention alignment (DASAA), for video SR. Specifically, we start by constructing a dual attention module (DAM) to strengthen the acquired spatiotemporal features and adopt a self-attention structure with the morphological mask to achieve attention alignment. Then, on top of the attention features, we utilize the up-sampling operation to reconstruct the super-resolved video images and introduce the LSTM (long short-time memory) network to guarantee the coherent consistency of the generated video frames both temporally and spatially. Experimental results and comparisons on the actual Youku-VESR dataset and the typical benchmark dataset-Vimeo-90 k demonstrate that our proposed approach achieves the best video SR effect while taking the least amount of computation. Specifically, in the Youku-VESR dataset, our proposed approach achieves a test PSNR of 35.290db and a SSIM of 0.939, respectively. In the Vimeo-90 k dataset, the PSNR/SSIM indexes of our approach are 32.878db and 0.774. Moreover, the FLOPS (float-point operations per second) of our approach is as low as 6.39G. The proposed DASAA method surpasses all video SR algorithms in the comparison. It is also revealed that there is no linear relationship between positional attention and channel attention. It suggests that our DASAA with LSTM coherent consistency architecture may have great potential for many low-level vision video applications.

中文翻译：

具有自动注意力对准功能的双重注意力，可实现高效的视频超分辨率

通过选择性增强从卷积网络中提取的特征，注意力机制已显示出其对低级视觉任务（尤其是图像超分辨率（SR））的有效性。但是，由于视频序列的时空连续性，仅将图像注意力施加到视频上似乎无法获得良好的SR结果。目前，仍然缺乏合适的注意力结构来实现有效的视频SR。在这项工作中，我们在双重注意（即位置注意和频道注意）的基础上，针对视频SR提出了以自我注意对齐（DASAA）为基础的深度双重注意。具体来说，我们首先构建一个双重注意模块（DAM）以增强获得的时空特征，并采用带有形态遮罩的自注意结构来实现注意对齐。然后，除了注意力特征之外，我们利用上采样操作来重建超分辨视频图像，并引入LSTM（长短时记忆）网络以确保所生成视频帧在时间和空间上的一致性。在实际的Youku-VESR数据集和典型的基准数据集-Vimeo-90 k上进行的实验结果和比较结果表明，我们提出的方法可实现最佳的视频SR效果，而所需的计算量却最少。具体来说，在Youku-VESR数据集中，我们提出的方法分别实现了35.290db的测试PSNR和0.939的SSIM。在Vimeo-90 k数据集中，我们方法的PSNR / SSIM索引为32.878db和0.774。此外，我们的方法的FLOPS（每秒浮点运算）低至6.39G。在比较中，提出的DASAA方法优于所有视频SR算法。还表明，位置注意和通道注意之间没有线性关系。这表明我们的具有LSTM相干一致性架构的DASAA对于许多低级视觉视频应用可能具有巨大的潜力。

更新日期：2021-05-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11