当前位置: X-MOL 学术IEEE J. Solid-State Circuits › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Dynamic Timing Enhanced DNN Accelerator With Compute-Adaptive Elastic Clock Chain Technique
IEEE Journal of Solid-State Circuits ( IF 5.4 ) Pub Date : 2021-01-01 , DOI: 10.1109/jssc.2020.3027953
Tianyu Jia , Yuhao Ju , Jie Gu

This article presents a deep neural network (DNN) accelerator using an adaptive clocking technique (i.e., elastic clock chain) to exploit the dynamic timing margin for the 2-D processing element (PE) array-based DNN accelerator. To address two major challenges on exploiting dynamic timing margin for modern deep learning accelerators (i.e., diminishing dynamic timing margin on a large array and strong timing dependence on runtime operands), in this work, we proposed an elastic clock chain scheme to provide a flexible multi-domain clock management scheme for in situ compute adaptability. More specifically, a total of 16 clock domains have been created for the 2-D PE array with the clock periods dynamically adjusted based on both runtime instructions and operands. The multi-domain clock sources are generated from a multi-phase delay-locked loop (DLL) and delivered by a global clock bus. The clock offsets between neighboring domains are deliberately managed to maintain the synchronization among clock domains. A 16 $\times \,\,8$ PE array that supports different DNN dataflows and bit-precisions was fabricated using a 65-nm CMOS process. The measurement results on MNIST and CIFAR-10 data sets showed that the effective operating frequency was improved by up to 19% for a single instruction multiple data (SIMD) data flow by enabling the operation of the proposed elastic clock chain. The performance improvement was converted into up to 34% energy saving. Compared with SIMD data flow, the systolic dataflow shows reduced performance improvement of up to 11% due to the consideration of all in-flight operand values.

中文翻译:

具有计算自适应弹性时钟链技术的动态时序增强型 DNN 加速器

本文介绍了一种深度神经网络 (DNN) 加速器,该加速器使用自适应时钟技术(即弹性时钟链)来利用基于 2-D 处理元件 (PE) 阵列的 DNN 加速器的动态时序余量。为了解决利用现代深度学习加速器的动态时序裕度的两个主要挑战(即减少大型阵列上的动态时序裕度和对运行时操作数的强时序依赖性),在这项工作中,我们提出了一种弹性时钟链方案以提供灵活的用于原位计算适应性的多域时钟管理方案。更具体地说,已经为 2-D PE 阵列创建了总共 16 个时钟域,时钟周期基于运行时指令和操作数进行动态调整。多域时钟源由多相延迟锁定环 (DLL) 生成,并由全局时钟总线提供。相邻域之间的时钟偏移被有意管理以保持时钟域之间的同步。支持不同 DNN 数据流和位精度的 16 $\times \,\,8$ PE 阵列是使用 65-nm CMOS 工艺制造的。在 MNIST 和 CIFAR-10 数据集上的测量结果表明,通过启用所提出的弹性时钟链的操作,单指令多数据 (SIMD) 数据流的有效操作频率提高了 19%。性能提升转化为高达 34% 的节能。与 SIMD 数据流相比,由于考虑了所有动态操作数值,收缩数据流的性能提升最多可降低 11%。
更新日期:2021-01-01
down
wechat
bug