当前位置: X-MOL 学术J. Syst. Archit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging
Journal of Systems Architecture ( IF 3.7 ) Pub Date : 2021-03-13 , DOI: 10.1016/j.sysarc.2021.102094
Miao Wang , Xiaoya Fan , Wei Zhang , Ting Zhu , Tengteng Yao , Hui Ding , Danghui Wang

Embedded devices are common carriers for deploying inference networks, which leverage the customized accelerator to achieve the promised performance with strict resource constraints. In the inference of Deep Neural Network(DNN), the sparsity existing in the activations and weights of every layer contributes massive non-effective memory accesses and computing operations. The data compression is adopted as a data pruning method for accelerator design, which eliminates the zero-valued data with a specific data packaging method. However, the data compression, in varying degrees, breaks the data regularity of the processing array DNN accelerators calculates with. The complexity of data access caused by irregular data organization will add extra control logic and decoding logic to compensate.

The accelerator architecture that supports sparsity can use the sophisticated memory access scheming and parallel on-chip decoder structure via an efficient data packaging method to balance memory-accessing and computing for acceleration. In this paper, we propose a flexible and highly parallel accelerator architecture that uses a quantitative data packaging method which is efficient and stable for different degrees of sparsity and parallel optimization to explore the sparsity in DNN to achieve high performance with low energy consumption. The total DRAM accesses, performance and energy consumption of the proposed sparse architecture are evaluated with different inference networks. Experiments show that the DRAM accesses of the proposed efficient data packaging method is significantly lower than other commonly used sparse data compression storage methods, the improved performance and saved energy of the sparse accelerator architecture after adopting the optimization method proposed in this paper are up to 1.2x and 1.6x, respectively, over a comparably provisioned do not support sparsity accelerator. In addition, the accelerator architecture proposed has achieved energy efficiency and performance improvements of up to 1.70x and 1.56x, compared with the state-of-the-art architectures.



中文翻译:

通过有效的数据打包来平衡稀疏DNN加速器上的内存访问和计算

嵌入式设备是用于部署推理网络的常见载体,该网络利用定制的加速器在严格的资源约束下实现预期的性能。根据深度神经网络(DNN)的推论,每一层的激活和权重中存在的稀疏性会导致大量的无效内存访问和计算操作。数据压缩被用作加速器设计的数据修剪方法,该方法使用特定的数据打包方法消除了零值数据。但是,数据压缩在不同程度上破坏了DNN加速器计算所使用的处理数组的数据规则性。由不规则的数据组织导致的数据访问的复杂性将添加额外的控制逻辑和解码逻辑来进行补偿。

支持稀疏性的加速器体系结构可以通过有效的数据打包方法,使用复杂的存储器访问方案和并行片上解码器结构,以平衡存储器访问和加速计算之间的关系。在本文中,我们提出了一种灵活且高度并行的加速器体系结构,该体系结构使用定量数据打包方法,该方法对于不同程度的稀疏性和并行优化都是有效且稳定的,以探索DNN中的稀疏性以实现低能耗的高性能。所提出的稀疏架构的总DRAM访问,性能和能耗是通过不同的推理网络进行评估的。实验表明,所提出的高效数据打包方法的DRAM存取率明显低于其他常用的稀疏数据压缩存储方法,采用本文提出的优化方法后,稀疏加速器体系结构的性能改善和节能高达1.2。 x和1.6x,分别以相对应的方式提供,不支持稀疏加速器。此外,与最先进的架构相比,提出的加速器架构在能源效率和性能方面分别提高了1.70倍和1.56倍。在可比的情况下,不支持稀疏加速器。此外,与最先进的架构相比,提出的加速器架构在能源效率和性能方面分别提高了1.70倍和1.56倍。在可比的情况下,不支持稀疏加速器。此外,与最先进的架构相比,提出的加速器架构在能源效率和性能方面分别提高了1.70倍和1.56倍。

更新日期:2021-03-23
down
wechat
bug