Gated fusion network for SAO filter and inter frame prediction in Versatile Video Coding,Signal Processing: Image Communication

当前位置： X-MOL 学术 › Signal Process. Image Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Gated fusion network for SAO filter and inter frame prediction in Versatile Video Coding
Signal Processing: Image Communication ( IF 3.4 ) Pub Date : 2022-08-19 , DOI: 10.1016/j.image.2022.116839
Shiba Kuanar , Vassilis Athitsos , Dwarikanath Mahapatra , K.R. Rao

In order to achieve higher coding efficiency, the Versatile Video Coding (VVC) standard includes several new components at the expense of an increase in decoder computational complexity. These technologies often create ringing and contouring effects on the reconstructed frames at a low bit rate and introduce blurring and distortion. To smooth those visual artifacts, the H.266/VVC framework supports four post-processing filter operations. The state-of-the-art CNN-based in-loop filters prefer to deploy multiple networks for various quantization parameters and frame resolutions, which increases training resources and subsequently becomes overhead at decoder frame reconstruction. This paper presents a single deep-learning-based model for sample adaptive off-set (SAO) non-linear filtering operation on the decoder side, uses feature correlation among adjacent frames, and substantiates the merits of intra–inter frame quality enhancement. We introduced a variable filter size dual multi-scale convolutional neural network (D-MSCNN) to attenuate the compression artifact and incorporated strided deconvolution to restore the high-frequency details on the distorted frame. Our model follows sequential training across all QP values and updates the model weights. Using data augmentation, weight fusion, and residual learning, we demonstrated that our model could be trained effectively by transferring the convolution prior feature indices to the decoder to produce a dense output map. The Objective measurements demonstrate that the proposed method outperforms the baseline VVC method in PSNR, MS-SSIM, and VMAF metrics and achieves an average of 5.16% bit rate saving on different test sequence categories.

中文翻译：

多功能视频编码中用于 SAO 滤波器和帧间预测的门控融合网络

为了实现更高的编码效率，通用视频编码 (VVC) 标准包括几个新组件，但代价是增加了解码器的计算复杂度。这些技术通常会在低比特率的重建帧上产生振铃和轮廓效果，并引入模糊和失真。为了平滑这些视觉伪影，H.266/VVC 框架支持四种后处理过滤器操作。最先进的基于 CNN 的环路滤波器更喜欢为各种量化参数和帧分辨率部署多个网络，这增加了训练资源，随后成为解码器帧重建的开销。本文提出了一种基于深度学习的单一模型，用于解码器端的样本自适应偏移（SAO）非线性滤波操作，使用相邻帧之间的特征相关性，并证实帧内帧间质量增强的优点。我们引入了可变滤波器大小的双多尺度卷积神经网络 (D-MSCNN) 来衰减压缩伪影，并结合跨步反卷积来恢复失真帧上的高频细节。我们的模型遵循所有 QP 值的顺序训练并更新模型权重。使用数据增强、权重融合和残差学习，我们证明了我们的模型可以通过将卷积先验特征索引传输到解码器来有效地训练，以产生密集的输出图。客观测量表明，所提出的方法在 PSNR、MS-SSIM 和 VMAF 指标方面优于基线 VVC 方法，平均达到 5。

更新日期：2022-08-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文