当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators
arXiv - CS - Hardware Architecture Pub Date : 2020-06-10 , DOI: arxiv-2006.07137
Francisco Mu\~noz-Mart\'inez, Jos\'e L. Abell\'an, Manuel E. Acacio, Tushar Krishna

The design of specialized architectures for accelerating the inference procedure of Deep Neural Networks (DNNs) is a booming area of research nowadays. First-generation rigid proposals have been rapidly replaced by more advanced flexible accelerator architectures able to efficiently support a variety of layer types and dimensions. As the complexity of the designs grows, it is more and more appealing for researchers to have cycle-accurate simulation tools at their disposal to allow for fast and accurate design-space exploration, and rapid quantification of the efficacy of architectural enhancements during the early stages of a design. To this end, we present STONNE (Simulation TOol of Neural Network Engines), a cycle-accurate, highly-modular and highly-extensible simulation framework that enables end-to-end evaluation of flexible accelerator architectures running complete contemporary DNN models. We use STONNE to model the recently proposed MAERI architecture and show how it can closely approach the performance results of the publicly available BSV-coded MAERI implementation. Then, we conduct a comprehensive evaluation and demonstrate that the folding strategy implemented for MAERI results in very low compute unit utilization (25% on average across 5 DNN models) which in the end translates into poor performance.

中文翻译:

STONNE:灵活的神经网络加速器的详细架构模拟器

用于加速深度神经网络 (DNN) 推理过程的专用架构的设计是当今一个蓬勃发展的研究领域。第一代刚性提议已迅速被更先进的灵活加速器架构所取代,这些架构能够有效地支持各种层类型和维度。随着设计复杂性的增加,研究人员越来越需要使用周期精确的仿真工具来进行快速准确的设计空间探索,并在早期阶段快速量化架构增强的功效的设计。为此,我们提出了 STONNE(神经网络引擎的仿真工具),一个周期精确的,高度模块化和高度可扩展的仿真框架,可以对运行完整的现代 DNN 模型的灵活加速器架构进行端到端评估。我们使用 STONNE 对最近提出的 MAERI 架构进行建模,并展示它如何接近公开可用的 BSV 编码的 MAERI 实现的性能结果。然后,我们进行了全面评估并证明为 MAERI 实施的折叠策略导致计算单元利用率非常低(5 个 DNN 模型平均为 25%),最终导致性能不佳。
更新日期:2020-06-15
down
wechat
bug