An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2021-07-24 , DOI: 10.1007/s11042-021-11214-2
Menghu Jia ₁ , Jian Yue ₁ , Mao Ye ₁ , Yanbo Gao ₂ , Shuai Li ₂

Affiliation

The newest video coding standard, versatile video coding (VVC), has just been published recently. While it greatly improves the performance over the last High Efficiency Video Coding (HEVC) standard, there are still blocking artifacts under the more flexible block partitioning structures. In order to reduce the blocking artifact and improve the quality of the reconstructed video frame, an explicit self-attention-based multimodality convolutional neural network (CNN) is proposed in this paper. It adaptively adjusts the restoration of different coding units (CU) according to the CU partition structure and texture of the reconstructed video frame, considering that the loss scales of different CUs can be quite different. The proposed method takes advantage of the CU partition map by using it as a different modality and combined with the attention mechanism. Moreover, the unfiltered reconstructed image is also used to enhance the attention branch, which forms an explicit self-attention model. Then a densely integrated multi-stage fusion is developed where the attention branch is densely fused to the main filtering CNN to adaptively adjust the overall image recovery scale. Thorough analysis on the proposed method is provided with ablation study on each module. Experimental results show that the proposed method achieves the state-of-the-art performance under all intra (AI) configuration, with 7.24% BD-rate savings on average compared with the VVC reference software (VTM).

中文翻译：

用于通用视频编码的基于显式自我注意的多模态 CNN 环路滤波器

最新的视频编码标准，多功能视频编码 (VVC)，最近刚刚发布。虽然它比上一个高效视频编码 (HEVC) 标准大大提高了性能，但在更灵活的块分区结构下仍然存在块伪影。为了减少块伪影并提高重建视频帧的质量，本文提出了一种基于显式自我注意的多模态卷积神经网络（CNN）。它根据重构视频帧的CU分区结构和纹理自适应调整不同编码单元（CU）的恢复，考虑到不同CU的损失尺度可能有很大差异。所提出的方法通过将 CU 分区图用作不同的模态并结合注意机制来利用 CU 分区图。此外，未过滤的重建图像还用于增强注意力分支，形成显式自注意力模型。然后开发了密集集成的多级融合，其中注意力分支密集融合到主滤波 CNN 以自适应调整整体图像恢复规模。对所提出的方法进行了彻底的分析，并对每个模块进行了消融研究。实验结果表明，所提出的方法在所有帧内（AI）配置下都达到了最先进的性能，与 VVC 参考软件（VTM）相比，平均节省了 7.24% 的 BD 速率。这形成了一个显式的自注意力模型。然后开发了密集集成的多级融合，其中注意力分支密集融合到主滤波 CNN 以自适应调整整体图像恢复规模。对所提出的方法进行了彻底的分析，并对每个模块进行了消融研究。实验结果表明，所提出的方法在所有帧内（AI）配置下都达到了最先进的性能，与 VVC 参考软件（VTM）相比，平均节省了 7.24% 的 BD 速率。这形成了一个显式的自注意力模型。然后开发了密集集成的多级融合，其中注意力分支密集融合到主滤波 CNN 以自适应调整整体图像恢复规模。对所提出的方法进行了彻底的分析，并对每个模块进行了消融研究。实验结果表明，所提出的方法在所有帧内（AI）配置下都达到了最先进的性能，与 VVC 参考软件（VTM）相比，平均节省了 7.24% 的 BD 速率。

更新日期：2021-07-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11