当前位置: X-MOL 学术IEEE Trans. Circuit Syst. II Express Briefs › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Display Stream Compression Decoders for Fine-Grained Many-Core Processor Arrays
IEEE Transactions on Circuits and Systems II: Express Briefs ( IF 4.0 ) Pub Date : 2021-03-23 , DOI: 10.1109/tcsii.2021.3068272
Shifu Wu , Bevan M. Baas

This brief presents two software Display Stream Compression (DSC) video decoder designs for many-core processor arrays. The first design exploits fine-grained task-level parallelism and is able to decode pictures configured into one column of slices; it is implemented with 88 processors and 2 shared memory modules. The second design facilitates higher performance by leveraging scalable slice-level parallelism and is tailored for pictures configured into multiple columns of slices; one implementation of this design is mapped to 359 processors and 6 shared memory modules. At 1.75 GHz and 1.1 V, the proposed decoders decode 1080p video sequences in 4:2:0, 4:2:2, and 4:4:4 pixel formats-achieving up to 94.7 frames per second (fps), 95.6 fps, and 47.9 fps, while dissipating 23.9 nJ, 26.7 nJ, and 47.2 nJ per pixel, respectively. Our designs achieve up to 159× higher throughput and 841× lower energy per pixel than a DSC decoder implemented on one core of an Intel i7-7700HQ processor.

中文翻译:


用于细粒度众核处理器阵列的显示流压缩解码器



本简介介绍了两种适用于多核处理器阵列的软件显示流压缩 (DSC) 视频解码器设计。第一个设计利用细粒度任务级并行性,能够解码配置为一列切片的图片;它由 88 个处理器和 2 个共享内存模块实现。第二种设计通过利用可扩展的切片级并行性来实现更高的性能,并且针对配置为多列切片的图片进行定制;该设计的一种实现映射到 359 个处理器和 6 个共享内存模块。在 1.75 GHz 和 1.1 V 下,建议的解码器以 4:2:0、4:2:2 和 4:4:4 像素格式解码 1080p 视频序列,实现高达每秒 94.7 帧 (fps)、95.6 fps、和 47.9 fps,同时每个像素的功耗分别为 23.9 nJ、26.7 nJ 和 47.2 nJ。与在英特尔 i7-7700HQ 处理器的一个内核上实现的 DSC 解码器相比,我们的设计实现了高达 159 倍的吞吐量提高和 841 倍的每像素能耗降低。
更新日期:2021-03-23
down
wechat
bug