当前位置: X-MOL 学术J. Syst. Archit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ESSA: An energy-Aware bit-Serial streaming deep convolutional neural network accelerator
Journal of Systems Architecture ( IF 4.5 ) Pub Date : 2020-07-03 , DOI: 10.1016/j.sysarc.2020.101831
Lien-Chih Hsu , Ching-Te Chiu , Kuan-Ting Lin , Hsing-Huan Chou , Yen-Yu Pu

Over the past decade, deep convolutional neural networks (CNN) have been widely embraced in various visual recognition applications owing to their extraordinary accuracy. However, their high computational complexity and excessive data storage present two challenges when designing CNN hardware. In this paper, we propose an energy-aware bit-serial streaming deep CNN accelerator to tackle these challenges. Using ring streaming dataflow and the output reuse strategy to decrease data access, the amount of external DRAM access for the convolutional layers is reduced by 357.26x when compared with that of no output reuse case on AlexNet. We optimize the hardware utilization and avoid unnecessary computations using the loop tiling technique and by mapping the strides of the convolutional layers to unit-ones for computational performance enhancement. In addition, the bit-serial processing element (PE) is designed to use fewer bits in weights, which can reduce both the amount of computation and external memory access. We evaluate our design using the well-known roofline model. The design space is explored to find the solution with the best computational performance and communication to computation (CTC) ratio. We can reach 1.36x speed and reduce energy consumption by 41% for external memory access compared with the design in [1]. The hardware implementation for our PE Array architecture design can reach an operating frequency of 119 MHz and consumes 68 k gates with a power consumption of 10.08 mW using TSMC 90-nm technology. Compared to the 15.4 MB external memory access for Eyeriss [2] on the convolutional layers of AlexNet, our method only requires 4.36 MB of external memory access to dramatically reduce the costliest portion of power consumption.



中文翻译:

ESSA:能量感知比特串行流深度卷积神经网络加速器

在过去的十年中,深卷积神经网络(CNN)由于其非凡的准确性而被广泛应用于各种视觉识别应用中。但是,它们的高计算复杂性和过多的数据存储在设计CNN硬件时提出了两个挑战。在本文中,我们提出了一种能量感知的位串行流深层CNN加速器来应对这些挑战。通过使用环形流数据流和输出重用策略来减少数据访问,与AlexNet上无输出重用情况相比,卷积层的外部DRAM访问量减少了357.26倍。我们优化了硬件利用率,并使用循环切片技术并通过将卷积层的步幅映射到单位为1的单位来避免不必要的计算,以提高计算性能。此外,位串行处理元件(PE)设计为使用较少的权重位,这可以减少计算量和外部存储器访问量。我们使用著名的Roofline模型评估我们的设计。探索设计空间以找到具有最佳计算性能和通信与计算(CTC)比率的解决方案。与[1]中的设计相比,我们可以达到1.36倍的速度并将外部存储器访问的能耗降低41%。使用台积电90纳米技术,我们PE阵列架构设计的硬件实现可以达到119 MHz的工作频率,并消耗68 k的门,功耗为10.08 mW。与AlexNet卷积层上的Eyeriss [2]的15.4 MB外部存储器访问相比,我们的方法只需要4。

更新日期:2020-07-03
down
wechat
bug