当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-05-06 , DOI: 10.1109/tmm.2020.2992968
Mingliang Zhou , Xuekai Wei , Sam Kwong , Weijia Jia , Bin Fang

Rate control (RC) plays a critical role in the transmission of high-quality video data under certain bandwidth restrictions in High Efficiency Video Coding (HEVC). Most current HEVC RC algorithms based on spatio-temporal information for rate-distortion (R-D) model parameters cannot effectively handle the cases with dynamic video sequences that contain fast moving objects, significant object occlusion or scene changes. In this paper, we propose an RC method based on deep reinforcement learning (DRL) for dynamic video sequences in HEVC to improve the coding efficiency. First, the rate control problem is formulated as a Markov decision process (MDP) problem. Second, with the MDP model, we develop a DRL-based algorithm to find the optimal quantization parameters (QPs) by training a deep neural network. The resulting intelligent agent selects the optimal RC strategy to reduce distortion, buffer and quality fluctuations by observing the current state of the encoder. The asynchronous advantage actor-critic (A3C) method is used to solve the MDP problem. Finally, the proposed DRL-based RC method is implemented in the newest video coding standard. Experimental results show that the proposed method offers substantially enhanced RC accuracy and consistently outperforms HEVC reference software and other state-of-the-art algorithms.

中文翻译:

基于深度强化学习的HEVC动态视频序列速率控制方法

在高效视频编码(HEVC)的某些带宽限制下,速率控制(RC)在高质量视频数据的传输中起着至关重要的作用。当前大多数基于时空信息的率失真(RD)模型参数的HEVC RC算法无法有效处理包含动态视频序列的情况,这些视频序列包含快速移动的对象,明显的对象遮挡或场景变化。在本文中,我们提出了一种基于深度增强学习(DRL)的RC方法,用于HEVC中的动态视频序列,以提高编码效率。首先,将速率控制问题表述为马尔可夫决策过程(MDP)问题。其次,利用MDP模型,我们开发了一种基于DRL的算法,通过训练深度神经网络来找到最佳量化参数(QP)。生成的智能代理选择最佳的RC策略,以通过观察编码器的当前状态来减少失真,缓冲区和质量波动。异步优势参与者批评(A3C)方法用于解决MDP问题。最后,所提出的基于DRL的RC方法是在最新的视频编码标准中实现的。实验结果表明,该方法可显着提高RC精度,并且始终优于HEVC参考软件和其他最新算法。
更新日期:2020-05-06
down
wechat
bug