当前位置: X-MOL 学术IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SRNPU: An Energy-Efficient CNN-based Super-Resolution Processor with Tile-based Selective Super-Resolution in Mobile Devices
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 4.6 ) Pub Date : 2020-09-01 , DOI: 10.1109/jetcas.2020.3014454
Juhyoung Lee , Jinsu Lee , Hoi-Jun Yoo

In this article, we propose an energy-efficient convolutional neural network (CNN) based super-resolution (SR) processor, super-resolution neural processing unit (SRNPU), for mobile applications. Traditionally, it is hard to realize real-time CNN-based SR on resource-limited platforms like mobile devices due to its massive amount of computation workload and communication bandwidth with external memory. The SRNPU can support the tile-based selective super-resolution (TSSR) which dynamically selects the proper sized CNN in a tile-by-tile manner. The TSSR reduces the computational workload of CNN-based SR by 31.1 % while maintaining image restoration performance. Moreover, a proposed selective caching based convolutional layer fusion (SC2LF) can reduce 78.8 % of external memory bandwidth with 93.2 % smaller on-chip memory footprint compared with previous layer fusion methods, by only caching short reuse distance intermediate feature maps. Additionally, reconfigurable cyclic ring architecture in the SRNPU enables maintaining high PE utilization by amortizing the reloading process caused by SC2LF operation under various convolutional layer configurations. The SRNPU is fabricated in 65 nm CMOS technology and occupies $4 \times 4$ mm2 die area. The SRNPU has a peak power efficiency of 1.9 TOPS/W at 0.75 V, 50 MHz. The SRNPU achieves 31.8 fps $\times 2$ scale Full-HD generation and 88.3 fps $\times 4$ scale Full-HD generation with higher restoration performance and power efficiency than previous SR hardware implementations. To the best of our knowledge, the SRNPU is the first ASIC implementation of the CNN-based SR algorithm which supports real-time Full-HD up-scaling.

中文翻译:

SRNPU:移动设备中具有基于图块的选择性超分辨率的节能型基于 CNN 的超分辨率处理器

在本文中,我们为移动应用提出了一种基于高能效卷积神经网络 (CNN) 的超分辨率 (SR) 处理器、超分辨率神经处理单元 (SRNPU)。传统上,由于其庞大的计算工作量和与外部存储器的通信带宽,很难在移动设备等资源有限的平台上实现实时的基于 CNN 的 SR。SRNPU 可以支持基于图块的选择性超分辨率 (TSSR),它以逐图块的方式动态选择合适大小的 CNN。TSSR 在保持图像恢复性能的同时,将基于 CNN 的 SR 的计算工作量减少了 31.1%。此外,提出的基于选择性缓存的卷积层融合(SC 2LF) 可以减少 78.8% 的外部存储器带宽,与之前的层融合方法相比,片上存储器占用面积减少 93.2%,仅通过缓存短重用距离中间特征图。此外,SRNPU 中的可重构循环环架构通过在各种卷积层配置下分摊由 SC 2 LF 操作引起的重新加载过程,能够保持高 PE 利用率。SRNPU 采用 65 nm CMOS 技术制造,占据 $4 \times 4$ mm 2芯片面积。SRNPU 在 0.75 V、50 MHz 时的峰值功率效率为 1.9 TOPS/W。SRNPU 达到 31.8 fps $\times 2$ 缩放全高清生成和 88.3 fps $\times 4$ 以比以前的 SR 硬件实现更高的恢复性能和电源效率扩展全高清生成。据我们所知,SRNPU 是第一个基于 CNN 的 SR 算法的 ASIC 实现,它支持实时全高清放大。
更新日期:2020-09-01
down
wechat
bug