Optimizing Memory-Access Patterns for Deep Learning Accelerators,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing Memory-Access Patterns for Deep Learning Accelerators
arXiv - CS - Performance Pub Date : 2020-02-27 , DOI: arxiv-2002.12798
Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai, Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, Yida Wang

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

中文翻译：

优化深度学习加速器的内存访问模式

深度学习 (DL) 工作负载正在转向加速器，以实现更快的处理和更低的成本。现代 DL 加速器擅长处理主导 DL 工作负载的大规模乘法累加操作；然而，充分利用加速器的计算能力是一项挑战，因为数据必须正确地存放在软件管理的暂存器内存中。如果不这样做，可能会导致显着的性能损失。本文提出了一种系统方法，该方法利用多面体模型来分析 DL 模型的所有算子，以最大限度地减少内存访问次数。实验表明，我们的方法可以显着降低常见神经网络模型所需的内存访问对名为 Inferentia 的本土 AWS 机器学习推理芯片的影响，

更新日期：2020-03-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文