当前位置: X-MOL 学术arXiv.cs.ET › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation
arXiv - CS - Emerging Technologies Pub Date : 2021-08-25 , DOI: arxiv-2108.11430
Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen, David Z. Pan

Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully demonstrated, they are still bottlenecked by expensive memory accesses due to tremendous gaps between the bandwidth/power/latency of electrical memory and computing cores. Previous solutions fail to fully-leverage the ultra-fast computational speed of emerging DNN accelerators to break through the critical memory bound. In this work, we propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations, directly translating to performance improvement. We are the first to jointly explore the intrinsic correlations and bit-level redundancy within DNN kernels and propose a multi-level in situ generation mechanism with mixed-precision bases to achieve on-the-fly recovery of high-resolution parameters with minimum hardware overhead. Extensive experiments demonstrate that our proposed joint method can boost the memory efficiency by 10-20x with comparable accuracy over four state-of-the-art designs, when benchmarked on ResNet-18/DenseNet-121/MobileNetV2/V3 with various tasks.

中文翻译:

通过多级原位生成实现高效记忆的神经网络

深度神经网络 (DNN) 在各种任务中表现出卓越的性能。随着它们的快速发展,它们不断升级的计算和内存需求使得将它们部署在资源受限的边缘设备上变得具有挑战性。尽管从传统电子到新兴光子学的广泛高效加速器设计已经得到成功展示,但由于电存储器和计算内核的带宽/功率/延迟之间存在巨大差距,它们仍然受到昂贵的存储器访问的瓶颈。以前的解决方案未能充分利用新兴 DNN 加速器的超快计算速度来突破关键的内存限制。在这项工作中,我们提出了一个通用和统一的框架,用超快的片上计算来交换昂贵的内存事务,直接转化为性能改进。我们率先联合探索 DNN 内核中的内在相关性和位级冗余,并提出了一种具有混合精度基础的多级原位生成机制,以最小的硬件开销实现高分辨率参数的即时恢复. 大量实验表明,当在 ResNet-18/DenseNet-121/MobileNetV2/V3 上对各种任务进行基准测试时,我们提出的联合方法可以将内存效率提高 10-20 倍,并且与四种最先进的设计相比具有相当的准确性。
更新日期:2021-08-27
down
wechat
bug