当前位置: X-MOL 学术IEEE Comput. Archit. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Partitioned Training From Near-Storage Computing to DNN Accelerators
IEEE Computer Architecture Letters ( IF 1.4 ) Pub Date : 2021-05-19 , DOI: 10.1109/lca.2021.3081752
Yongjoo Jang , Sejin Kim , Daehoon Kim , Sungjin Lee , Jaeha Kung

In this letter, we present deep partitioned training to accelerate computations involved in training DNN models. This is the first work that partitions a DNN model across storage devices, an NPU and a host CPU forming a unified compute node for training workloads. To validate the benefit of using the proposed system during DNN training, a trace-based simulator or an FPGA prototype is used to estimate the overall performance and obtain the layer index to be partitioned that provides the minimum latency. As a case study, we select two benchmarks, i.e., vision-related tasks and a recommendation system. As a result, the training time reduces by 12.2 ~ 31.0 percent with four near-storage computing devices in vision-related tasks with a mini-batch size of 512 and 40.6 ~ 44.7 percent with one near-storage computing device in the selected recommendation system with a mini-batch size of 64.

中文翻译:


从近存储计算到 DNN 加速器的深度分区训练



在这封信中,我们提出了深度分区训练,以加速训练 DNN 模型所涉及的计算。这是第一个将 DNN 模型跨存储设备、NPU 和主机 CPU 分区的工作,形成用于训练工作负载的统一计算节点。为了验证在 DNN 训练期间使用所提出的系统的好处,使用基于轨迹的模拟器或 FPGA 原型来估计整体性能并获得提供最小延迟的要分区的层索引。作为案例研究,我们选择两个基准,即视觉相关任务和推荐系统。结果,在小批量大小为512的视觉相关任务中,使用四台近存储计算设备时,训练时间减少了12.2%~31.0%;在所选推荐系统中,使用一台近存储计算设备时,训练时间减少了40.6%~44.7%。小批量大小为 64。
更新日期:2021-05-19
down
wechat
bug