ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and Data Organization for Deep Neural Network Accelerators,IEEE Transactions on Very Large Scale Integration (VLSI) Systems

当前位置： X-MOL 学术 › IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and Data Organization for Deep Neural Network Accelerators
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-03-04 , DOI: 10.1109/tvlsi.2021.3060509
Rachmad Vidya Wicaksana Putra , Muhammad Abdullah Hanif , Muhammad Shafique

Enabling high energy efficiency is crucial for embedded implementations of deep learning. Several studies have shown that the DRAM-based off-chip memory accesses are one of the most energy-consuming operations in deep neural network (DNN) accelerators and, thereby, limit the designs from achieving efficiency gains at the full potential. DRAM access energy varies depending upon the number of accesses required and the energy consumed per-access. Therefore, searching for a solution toward the minimum DRAM access energy is an important optimization problem. Toward this, we propose the ROMANet methodology that aims at reducing the number of memory accesses, by searching for the appropriate data partitioning and scheduling for each layer of a network using a design space exploration, based on the knowledge of the available on-chip memory and the data reuse factors. Moreover, ROMANet also targets decreasing the number of DRAM row buffer conflicts and misses by exploiting the DRAM multibank burst feature to improve the energy-per-access. Besides providing the energy benefits, our proposed DRAM data mapping also results in an increased effective DRAM throughput, which is useful for latency-constraint scenarios. Our experimental results show that the ROMANet saves DRAM access energy by 12% for the AlexNet, 36% for the VGG-16, 46% for the MobileNet, and 45% for the SqueezeNet while improving the DRAM throughput by 10% on average across different networks compared to the state of the art, i.e., bus-width aware (BWA) technique.

中文翻译：

ROMANet：用于深度神经网络加速器的细粒度重用驱动的片外内存访问管理和数据组织

实现高能效对于深度学习的嵌入式实现至关重要。多项研究表明，基于DRAM的片外存储器访问是深度神经网络（DNN）加速器中最耗能的操作之一，因此限制了设计在最大潜力下实现效率提升。DRAM访问能量取决于所需的访问次数和每次访问消耗的能量。因此，寻求朝向最小DRAM访问能量的解决方案是重要的优化问题。为此，我们提出了ROMANet方法，该方法旨在通过使用设计空间探索为网络的每一层搜索适当的数据分区和调度，从而减少内存访问次数，基于对可用片上存储器和数据重用因素的了解。此外，ROMANet还旨在通过利用DRAM多存储库突发功能来提高每次访问的能量，从而减少DRAM行缓冲区冲突和未命中的次数。除了提供能源优势之外，我们提出的DRAM数据映射还提高了有效DRAM吞吐量，这对于延迟受限的情况很有用。我们的实验结果表明，ROMANet为AlexNet节省了12％的DRAM访问能量，为VGG-16节省了36％，为MobileNet节省了46％，为SqueezeNet节省了45％，同时在不同情况下平均将DRAM吞吐量提高了10％。网络与现有技术相比，即总线宽度感知（BWA）技术。ROMANet还旨在通过利用DRAM多库突发功能来提高每次访问的能量，从而减少DRAM行缓冲区冲突和未命中的次数。除了提供能源优势之外，我们提出的DRAM数据映射还提高了有效DRAM吞吐量，这对于延迟受限的情况很有用。我们的实验结果表明，ROMANet为AlexNet节省了12％的DRAM访问能量，为VGG-16节省了36％，为MobileNet节省了46％，为SqueezeNet节省了45％，同时在不同情况下平均将DRAM吞吐量提高了10％。网络与现有技术相比，即总线宽度感知（BWA）技术。ROMANet还旨在通过利用DRAM多库突发功能来提高每次访问的能量，从而减少DRAM行缓冲区冲突和未命中的次数。除了提供能源优势之外，我们提出的DRAM数据映射还提高了有效DRAM吞吐量，这对于延迟受限的情况很有用。我们的实验结果表明，ROMANet为AlexNet节省了12％的DRAM访问能量，为VGG-16节省了36％，为MobileNet节省了46％，为SqueezeNet节省了45％，同时在不同情况下平均将DRAM吞吐量提高了10％。网络与现有技术相比，即总线宽度感知（BWA）技术。我们提出的DRAM数据映射还可以提高有效DRAM吞吐量，这对于延迟受限的情况很有用。我们的实验结果表明，ROMANet为AlexNet节省了12％的DRAM访问能量，为VGG-16节省了36％，为MobileNet节省了46％，为SqueezeNet节省了45％，同时在不同情况下平均将DRAM吞吐量提高了10％。网络与现有技术相比，即总线宽度感知（BWA）技术。我们提出的DRAM数据映射还可以提高有效DRAM吞吐量，这对于延迟受限的情况很有用。我们的实验结果表明，ROMANet为AlexNet节省了12％的DRAM访问能量，为VGG-16节省了36％，为MobileNet节省了46％，为SqueezeNet节省了45％，同时在不同情况下平均将DRAM吞吐量提高了10％。网络与现有技术相比，即总线宽度感知（BWA）技术。

更新日期：2021-04-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11