Efficient Algorithms for Device Placement of DNN Graph Operators,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Algorithms for Device Placement of DNN Graph Operators
arXiv - CS - Machine Learning Pub Date : 2020-06-29 , DOI: arxiv-2006.16423
Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, Fanny Nina Paravecino

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices. In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings. We then provide algorithms that solve this problem to optimality. We demonstrate the applicability and efficiency of our approaches using several contemporary DNN computation graphs.

中文翻译：

用于 DNN 图算子的设备放置的高效算法

现代机器学习工作负载使用具有复杂结构的大型模型，执行起来非常昂贵。执行复杂模型的设备正变得越来越异构，因为我们看到除了 CPU 之外，特定领域的加速器作为硬件加速器的蓬勃发展。这些趋势需要在多个设备之间分配工作负载。最近的工作表明，模型并行可以获得显着的收益，即将神经网络的计算图划分到多个设备上。特别是，这种形式的并行性假设了一个设备管道，它被馈送了一个样本流，并为 DNN 的训练和推理产生了高吞吐量。但是，对于此类设置（大型模型和多个异构设备），我们需要能够跨设备划分 ML 工作负载的自动化算法和工具链。在本文中，我们识别并隔离了 DNN 算子设备放置核心的结构化优化问题，用于推理和训练，尤其是在现代流水线设置中。然后我们提供将这个问题解决到最优的算法。我们使用几个现代 DNN 计算图证明了我们的方法的适用性和效率。

更新日期：2020-11-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>