当前位置: X-MOL 学术IEEE Trans. Consum. Electron. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable Wavefront Parallel Streaming Deblocking Filter Hardware for HEVC Decoder
IEEE Transactions on Consumer Electronics ( IF 4.3 ) Pub Date : 2020-02-01 , DOI: 10.1109/tce.2019.2960565
Swamy Baldev , Kiran Kumar Anumandla , Rangababu Peesapati

The proposed work aims to design a Wavefront Parallel Processing (WPP) based streaming Deblocking Filter (DBF) architecture for High-Efficiency Video Coding (HEVC). This architecture supports scalable pipeline stages such as 1, 2, 4 and 8 with Coding Unit (CU) sizes, i.e., $8\times 8$ , $16\times 16$ , $32\times 32$ and $64\times 64$ Largest Coding Unit (LCU) processing respectively. Based on the requirements of speed and area of a consumer electronic application, one of the aforementioned sizes of CU or LCU is selected. The hardware uses an intelligent Memory Organization (MO) based on WPP technique with restructured CUs/LCU size without having any neighboring block dependencies. The proposed designs are implemented on an Application-Specific Integrated Circuit (ASIC) using 180-nm technology and Field Programmable Gate Array (FPGA). Experimental results show that the $32\times 32$ and $64\times 64$ block processing hardware decreases the processing cycles (128, 96) with the gate count of 286.47K and 744.13K respectively. Similarly, CUs $8\times 8$ , $16\times 16$ consume 512 and 176 processing clock cycles with an equivalent gate count of 90.72K and 194.97K respectively. The performance of proposed hardware compared with the previous works in terms of area and speed. The results show that the proposed hardware can process 4K Ultra High Definition (UHD) video frames at the rate of 50 fps at 300 MHz.

中文翻译:

用于 HEVC 解码器的可扩展波前并行流解块滤波器硬件

拟议的工作旨在为高效视频编码 (HEVC) 设计基于波前并行处理 (WPP) 的流式解块滤波器 (DBF) 架构。该架构支持具有编码单元 (CU) 大小的可扩展流水线阶段,例如 1、2、4 和 8,即, $8\乘以8$ , $16\乘以16$ , $32\乘以 32$ $64\times 64$ 分别处理最大编码单元 (LCU)。根据消费电子应用对速度和面积的要求,选择上述CU或LCU尺寸之一。硬件使用基于 WPP 技术的智能内存组织 (MO),具有重组的 CU/LCU 大小,没有任何相邻块依赖性。提议的设计是在使用 180 纳米技术和现场可编程门阵列 (FPGA) 的专用集成电路 (ASIC) 上实现的。实验结果表明, $32\乘以 32$ $64\times 64$ 块处理硬件减少了处理周期(128、96),门数分别为 286.47K 和 744.13K。类似地,CU $8\乘以8$ , $16\乘以16$ 消耗 512 和 176 个处理时钟周期,等效门数分别为 90.72K 和 194.97K。在面积和速度方面,所提出的硬件的性能与以前的作品相比。结果表明,所提出的硬件可以在 300 MHz 下以 50 fps 的速率处理 4K 超高清 (UHD) 视频帧。
更新日期:2020-02-01
down
wechat
bug