当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Cross Channel Context Model for Latents in Deep Image Compression
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-03-04 , DOI: arxiv-2103.02884
Changyue Ma, Zhao Wang, Ruling Liao, Yan Ye

This paper presents a cross channel context model for latents in deep image compression. Generally, deep image compression is based on an autoencoder framework, which transforms the original image to latents at the encoder and recovers the reconstructed image from the quantized latents at the decoder. The transform is usually combined with an entropy model, which estimates the probability distribution of the quantized latents for arithmetic coding. Currently, joint autoregressive and hierarchical prior entropy models are widely adopted to capture both the global contexts from the hyper latents and the local contexts from the quantized latent elements. For the local contexts, the widely adopted 2D mask convolution can only capture the spatial context. However, we observe that there are strong correlations between different channels in the latents. To utilize the cross channel correlations, we propose to divide the latents into several groups according to channel index and code the groups one by one, where previously coded groups are utilized to provide cross channel context for the current group. The proposed cross channel context model is combined with the joint autoregressive and hierarchical prior entropy model. Experimental results show that, using PSNR as the distortion metric, the combined model achieves BD-rate reductions of 6.30% and 6.31% over the baseline entropy model, and 2.50% and 2.20% over the latest video coding standard Versatile Video Coding (VVC) for the Kodak and CVPR CLIC2020 professional dataset, respectively. In addition, when optimized for the MS-SSIM metric, our approach generates visually more pleasant reconstructed images.

中文翻译:

深度图像压缩中潜在的跨通道上下文模型

本文提出了深层图像压缩中潜在的跨通道上下文模型。通常,深度图像压缩基于自动编码器框架,该框架将原始图像转换为编码器的潜像,并从解码器的量化潜像中恢复重建的图像。该变换通常与熵模型结合,该熵模型估计量化潜伏的概率分布以进行算术编码。目前,联合自回归和分层先验熵模型已被广泛采用,以捕获来自超潜在的全局上下文和来自量化潜在元素的局部上下文。对于局部上下文,广泛采用的2D遮罩卷积只能捕获空间上下文。但是,我们观察到潜伏中不同通道之间存在很强的相关性。为了利用跨信道相关性,我们建议根据信道索引将潜伏数分为几组,并一一编码这些组,其中先前编码的组用于为当前组提供跨信道上下文。所提出的跨通道上下文模型与联合自回归和分层先验熵模型相结合。实验结果表明,使用PSNR作为失真度量,组合模型的BD速率比基准熵模型降低了6.30%和6.31%,比最新的视频编码标准多功能视频编码(VVC)降低了2.50%和2.20%分别用于Kodak和CVPR CLIC2020专业数据集。此外,当针对MS-SSIM指标进行优化时,我们的方法会生成视觉上更令人愉悦的重建图像。我们建议根据通道索引将潜伏者分为几组,并一一编码,其中以前编码的组用于为当前组提供跨通道上下文。所提出的跨通道上下文模型与联合自回归和分层先验熵模型相结合。实验结果表明,使用PSNR作为失真度量,组合模型的BD速率比基准熵模型降低了6.30%和6.31%,比最新的视频编码标准多功能视频编码(VVC)降低了2.50%和2.20%分别用于Kodak和CVPR CLIC2020专业数据集。此外,当针对MS-SSIM指标进行优化时,我们的方法会生成视觉上更令人愉悦的重建图像。我们建议根据通道索引将潜伏者分为几组,并一一编码,其中以前编码的组用于为当前组提供跨通道上下文。所提出的跨通道上下文模型与联合自回归和分层先验熵模型相结合。实验结果表明,使用PSNR作为失真度量,组合模型的BD速率比基准熵模型降低了6.30%和6.31%,比最新的视频编码标准多功能视频编码(VVC)降低了2.50%和2.20%分别用于Kodak和CVPR CLIC2020专业数据集。此外,当针对MS-SSIM指标进行优化时,我们的方法会生成视觉上更令人愉悦的重建图像。
更新日期:2021-03-05
down
wechat
bug