A Deep Learning Approach in Scalable High Efficiency Video Coding for Fast Coding Unit Size Decision,IETE Technical Review

当前位置： X-MOL 学术 › IETE Tech. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Deep Learning Approach in Scalable High Efficiency Video Coding for Fast Coding Unit Size Decision
IETE Technical Review ( IF 2.5 ) Pub Date : 2022-07-22 , DOI: 10.1080/02564602.2022.2100492
Sanagavarapu Karthik Sairam ₁ , Pullakandam Muralidhar ₁

Affiliation

The High Efficiency Video Coding (HEVC) with scalable extension known as SHVC allows encoding the same video with different resolutions in a single bitstream. However, this process increases the encoding complexity. The complexity is increased mainly due to the motion estimation process and the Rate-Distortion Optimization (RDO) search process. The Test Zonal (TZ) algorithm with fast search patterns helps to accelerate the motion estimation process. However, the search patterns may get trapped to local minima, leading to inaccurate motion vectors. Moreover, the RDO search process used to determine the Coding Unit (CU) size increases the complexity. We proposed the Horizontal Subsampling Motion Estimation (HSME) method to find the accurate motion vectors with reduced complexity. The experimental results prove that the HSME method saves the encoding time by 53.03% with a 6.45% increase in Bjøntegaard delta bit rate and 0.28 dB loss in Bjøntegaard Delta Peak Signal-to-Noise Ratio (BD-PSNR) compared to standard SHM-12.1. In addition, we designed the Early Terminated Long- and Short-Term Memory (ET-LSTM) network that predicts the CU partition by taking the output features of the Early Terminated Convolutional Neural Network (ET-CNN). The ET-CNN learns the CU partitions from the residual Coding Tree Unit (CTU) using the deep learning approach. Our proposed method (HSME + ET-CNN + ET-LSTM) achieves 53% savings in encoding time, which is significantly higher than state-of-the-art methods.

中文翻译：

可扩展高效视频编码中用于快速编码单元大小决策的深度学习方法

具有可扩展扩展（称为 SHVC）的高效视频编码 (HEVC) 允许在单个比特流中对具有不同分辨率的相同视频进行编码。然而，这个过程增加了编码复杂度。复杂性的增加主要是由于运动估计过程和率失真优化（RDO）搜索过程。具有快速搜索模式的测试区域 (TZ) 算法有助于加速运动估计过程。然而，搜索模式可能会陷入局部最小值，导致运动矢量不准确。此外，用于确定编码单元（CU）大小的RDO搜索过程增加了复杂性。我们提出了水平子采样运动估计（HSME）方法，以降低复杂度来找到准确的运动矢量。实验结果证明，与标准SHM-12.1相比，HSME方法节省了53.03%的编码时间，Bjøntegaard delta比特率提高了6.45%，Bjøntegaard Delta峰值信噪比(BD-PSNR)损失了0.28 dB。。此外，我们还设计了早期终止长短期记忆 (ET-LSTM) 网络，该网络通过采用早期终止卷积神经网络 (ET-CNN) 的输出特征来预测 CU 分区。ET-CNN 使用深度学习方法从残差编码树单元 (CTU) 中学习 CU 分区。我们提出的方法（HSME + ET-CNN + ET-LSTM）在编码时间上节省了 53%，这明显高于最先进的方法。与标准 SHM-12.1 相比，Bjøntegaard Delta 峰值信噪比 (BD-PSNR) 损失 28 dB。此外，我们还设计了早期终止长短期记忆 (ET-LSTM) 网络，该网络通过采用早期终止卷积神经网络 (ET-CNN) 的输出特征来预测 CU 分区。ET-CNN 使用深度学习方法从残差编码树单元 (CTU) 中学习 CU 分区。我们提出的方法（HSME + ET-CNN + ET-LSTM）实现了 53% 的编码时间节省，这明显高于最先进的方法。与标准 SHM-12.1 相比，Bjøntegaard Delta 峰值信噪比 (BD-PSNR) 损失 28 dB。此外，我们还设计了早期终止长短期记忆 (ET-LSTM) 网络，该网络通过采用早期终止卷积神经网络 (ET-CNN) 的输出特征来预测 CU 分区。ET-CNN 使用深度学习方法从残差编码树单元 (CTU) 中学习 CU 分区。我们提出的方法（HSME + ET-CNN + ET-LSTM）实现了 53% 的编码时间节省，这明显高于最先进的方法。ET-CNN 使用深度学习方法从残差编码树单元 (CTU) 中学习 CU 分区。我们提出的方法（HSME + ET-CNN + ET-LSTM）实现了 53% 的编码时间节省，这明显高于最先进的方法。ET-CNN 使用深度学习方法从残差编码树单元 (CTU) 中学习 CU 分区。我们提出的方法（HSME + ET-CNN + ET-LSTM）实现了 53% 的编码时间节省，这明显高于最先进的方法。

更新日期：2022-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11