当前位置: X-MOL 学术Cluster Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data scheduling and placement in deep learning accelerator
Cluster Computing ( IF 4.4 ) Pub Date : 2021-07-10 , DOI: 10.1007/s10586-021-03355-8
Seyedeh Yasaman Hosseini Mirmahaleh 1 , Midia Reshadi 1 , Nader Bagherzadeh 2 , Ahmad Khademzadeh 3
Affiliation  

Deep neural networks (DNNs) have been employed to different devices as a popular machine learning algorithm (ML) owing to deploy the Internet of Things (IoT), data mining in cloud computing, and web search engines, which MLs had an impressive effect on IoT’s edge level nodes. Deploying DNN-based applications leads to memory access problems, including communication delay, energy efficiency, and bandwidth requirement. We propose a bus scheduling for data placement on distributed local buffers in a deep learning accelerator (DLA). The contributions of this paper include (1) providing a method for data-flow mapping between off-chip DRAM and distributed local buffers, and a flow mapping approach for data transfer between distributed local buffers and processing elements (PEs) (2) employing distributed local buffers in four directions for traffic distribution on a mesh based on memory access mechanism (3) bus scheduling for data placement on distributed local buffers. Simulated experiment based on typical DNN (i.e., AlexNet, VGG-16, and GoogLeNet) workflows demonstrates the effectiveness of the design: (1) the scheduling and mapping methods improve total runtime and bandwidth requirement by approximately 42.29% and 88.95% compared with TPU, respectively. Additionally, (2) our methods reduce total runtime for row-column stationary plus by approximately 99% compared with weight-stationary data-flow in CONV1 and CONV11 of VGG-16, respectively. This work reports the simulation results based on distributing AlexNet, VGG-16, and GoogLeNet’s traffics as the popular CNNs and DNNs models, whereas it investigates our method's efficiency for other trained models.



中文翻译:

深度学习加速器中的数据调度和放置

由于部署了物联网 (IoT)、云计算中的数据挖掘和网络搜索引擎,深度神经网络 (DNN) 作为一种流行的机器学习算法 (ML) 已被应用于不同的设备,MLs 对机器学习产生了令人印象深刻的影响。物联网的边缘级节点。部署基于 DNN 的应用程序会导致内存访问问题,包括通信延迟、能源效率和带宽要求。我们提出了一种总线调度,用于深度学习加速器 (DLA) 中分布式本地缓冲区上的数据放置。本文的贡献包括 (1) 提供了一种片外 DRAM 和分布式本地缓冲区之间的数据流映射方法,以及分布式本地缓冲区和处理单元 (PE) 之间数据传输的流映射方法 (2) 在基于内存访问机制的网格上采用四个方向的分布式本地缓冲区进行流量分配 (3) 用于分布式本地数据放置的总线调度缓冲区。基于典型 DNN(即 AlexNet、VGG-16 和 GoogLeNet)工作流程的模拟实验证明了设计的有效性:(1)与 TPU 相比,调度和映射方法将总运行时间和带宽需求提高了约 42.29% 和 88.95% , 分别。此外,(2)与 VGG-16 的 CONV1 和 CONV11 中的权重平稳数据流相比,我们的方法分别将行列平稳加的总运行时间减少了大约 99%。这项工作报告了基于分布 AlexNet 的模拟结果,

更新日期:2021-07-12
down
wechat
bug