Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models
arXiv - CS - Hardware Architecture Pub Date : 2021-03-01 , DOI: arxiv-2103.00768
Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

As the need for edge computing grows, many modern consumer devices now contain edge machine learning (ML) accelerators that can compute a wide range of neural network (NN) models while still fitting within tight resource constraints. We analyze a commercial Edge TPU using 24 Google edge NN models (including CNNs, LSTMs, transducers, and RCNNs), and find that the accelerator suffers from three shortcomings, in terms of computational throughput, energy efficiency, and memory access handling. We comprehensively study the characteristics of each NN layer in all of the Google edge models, and find that these shortcomings arise from the one-size-fits-all approach of the accelerator, as there is a high amount of heterogeneity in key layer characteristics both across different models and across different layers in the same model. We propose a new acceleration framework called Mensa. Mensa incorporates multiple heterogeneous ML edge accelerators (including both on-chip and near-data accelerators), each of which caters to the characteristics of a particular subset of models. At runtime, Mensa schedules each layer to run on the best-suited accelerator, accounting for both efficiency and inter-layer dependencies. As we analyze the Google edge NN models, we discover that all of the layers naturally group into a small number of clusters, which allows us to design an efficient implementation of Mensa for these models with only three specialized accelerators. Averaged across all 24 Google edge models, Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Edge TPU, and by 2.4x and 4.3x over Eyeriss v2, a state-of-the-art accelerator.

中文翻译：

缓解边缘机器学习推理瓶颈：加速Google Edge模型的实证研究

随着对边缘计算需求的增长，许多现代消费类设备现在都包含边缘机器学习（ML）加速器，该加速器可以计算广泛的神经网络（NN）模型，同时仍能满足严格的资源限制。我们使用24种Google边缘NN模型（包括CNN，LSTM，换能器和RCNN）分析了商用Edge TPU，发现该加速器在计算吞吐量，能源效率和内存访问处理方面存在三个缺点。我们对所有Google边缘模型中的每个NN层的特征进行了全面研究，发现这些缺点是由加速器的“一刀切”的所有方法引起的，因为关键层的特征都存在大量异质性跨不同模型，跨同一模型的不同层。我们提出了一个名为Mensa的新加速框架。Mensa集成了多个异构ML边缘加速器（包括片上和近数据加速器），每种加速器都可以满足特定模型子集的特征。在运行时，Mensa安排每个层在最适合的加速器上运行，同时考虑到效率和层间依赖性。在分析Google边缘NN模型时，我们发现所有层都自然地分成了少数几个集群，这使我们能够仅使用三个专门的加速器为这些模型设计有效的Mensa实现。Mensa在所有24种Google Edge型号中均得到平均，与Edge TPU相比，能效和吞吐量分别提高了3.0倍和3.1倍，与最先进的加速器Eyeriss v2相比，提高了2.4倍和4.3倍。Mensa集成了多个异构ML边缘加速器（包括片上和近数据加速器），每种加速器都可以满足特定模型子集的特征。在运行时，Mensa安排每个层在最适合的加速器上运行，同时考虑到效率和层间依赖性。在分析Google边缘NN模型时，我们发现所有层都自然地分成了少数几个集群，这使我们能够仅使用三个专门的加速器为这些模型设计有效的Mensa实现。Mensa在所有24种Google Edge型号中均得到平均，与Edge TPU相比，能效和吞吐量分别提高了3.0倍和3.1倍，与最先进的加速器Eyeriss v2相比，提高了2.4倍和4.3倍。Mensa集成了多个异构ML边缘加速器（包括片上和近数据加速器），每种加速器都可以满足特定模型子集的特征。在运行时，Mensa安排每个层在最适合的加速器上运行，同时考虑到效率和层间依赖性。在分析Google边缘NN模型时，我们发现所有层都自然地分成了少数几个集群，这使我们能够仅使用三个专门的加速器为这些模型设计有效的Mensa实现。Mensa在所有24种Google Edge型号中均得到平均，与Edge TPU相比，能效和吞吐量分别提高了3.0倍和3.1倍，与最先进的加速器Eyeriss v2相比，提高了2.4倍和4.3倍。

更新日期：2021-03-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>