当前位置: X-MOL 学术IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AERO: Design Space Exploration Framework for Resource-Constrained CNN Mapping on Tile-Based Accelerators
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 5-2-2022 , DOI: 10.1109/jetcas.2022.3171826
Simei Yang 1 , Debjyoti Bhattacharjee 1 , Vinay B. Y. Kumar 1 , Saikat Chatterjee 1 , Sayandip De 1 , Peter Debacker 1 , Diederik Verkest 1 , Arindam Mallik 1 , Francky Catthoor 1
Affiliation  

Analog In-Memory Compute (AIMC) arrays can store weights and perform matrix-vector multiplication operations for Deep Convolutional Neural Networks (CNNs). A number of recent efforts have integrated AIMC arrays into hybrid digital-analog accelerators in a multi-layer parallel manner to achieve energy efficiency and high throughput. Multi-layer parallelism on large-scale tile-based architectures need efficient mapping support at the processing element (PE)-level ( e.g. , digital or analog processing elements) and tile-level. To find the most efficient architectures, fast and accurate design space exploration (DSE) support is required. In this paper, a novel DSE framework, AERO, is presented to characterize a CNN inference workload executing on hybrid tile-based architectures that supports multi-layer parallelism. Three characteristics can be seen in our DSE framework: (1) It presents a hierarchical Tile/PE-level mapping exploration strategy including inter-layer interaction, and allowing layer fusion/splitting configurations for PE-level mapping optimization. (2) It unlocks different Performance, Power and Area (PPA) exploration points under both sufficient and limited resource constraints, while limited resource case is not considered in prior works of multi-layer parallel architectures. The impact of weight loading and weight stationary mapping are analyzed for better insights into hybrid tile-based architectures. (3) It incorporates a detailed PPA model that supports a broad range of hybrid digital and analog units in a tile. Experimental case-studies are performed for realistic and relevant benchmarks such as MLP, CNNs (Lenet-5, Resnet-18,-34,-50 and −101).

中文翻译:


AERO:基于 Tile 的加速器上资源受限 CNN 映射的设计空间探索框架



模拟内存计算 (AIMC) 阵列可以存储权重并为深度卷积神经网络 (CNN) 执行矩阵向量乘法运算。最近的一些研究以多层并行方式将 AIMC 阵列集成到混合数模加速器中,以实现能源效率和高吞吐量。大规模基于瓦片的架构上的多层并行性需要处理元件(PE)级(例如,数字或模拟处理元件)和瓦片级的有效映射支持。为了找到最高效的架构,需要快速、准确的设计空间探索 (DSE) 支持。在本文中,提出了一种新颖的 DSE 框架 AERO,用于表征在支持多层并行的基于混合图块的架构上执行的 CNN 推理工作负载。在我们的 DSE 框架中可以看到三个特征:(1)它提出了分层 Tile/PE 级映射探索策略,包括层间交互,并允许用于 PE 级映射优化的层融合/分割配置。 (2)它在充足和有限的资源约束下解锁了不同的性能、功耗和面积(PPA)探索点,而在多层并行架构的先前工作中没有考虑有限资源的情况。分析重量加载和重量固定映射的影响,以便更好地了解基于混合瓦片的架构。 (3) 它包含详细的 PPA 模型,支持图块中的各种混合数字和模拟单元。针对 MLP、CNN(Lenet-5、Resnet-18、-34、-50 和 −101)等现实且相关的基准进行实验案例研究。
更新日期:2024-08-26
down
wechat
bug