当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
arXiv - CS - Programming Languages Pub Date : 2021-05-12 , DOI: arxiv-2105.05720
Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Madanlal Musuvathi, Olli Sarikivi, Todd Mytkowicz, Youshan Miao

Modern deep learning workloads run on distributed hardware and are difficult to optimize -- data, model, and pipeline parallelism require a developer to thoughtfully restructure their workload around optimized computation and communication kernels in libraries such as cuBLAS and NCCL. The logical separation between computation and communication leaves performance on the table with missed optimization opportunities across abstraction boundaries. To explore these opportunities, this paper presents CoCoNet, which consists of a compute language to express programs with both computation and communication, a scheduling language to apply transformations on such programs, and a compiler to generate high performance kernels. Providing both computation and communication as first class constructs enables new optimizations, such as overlapping or fusion of communication with computation. CoCoNet allowed us to optimize several data, model and pipeline parallel workloads in existing deep learning systems with very few lines of code. We show significant improvements after integrating novel CoCoNet generated kernels.

中文翻译:

CoCoNet:共同优化分布式机器学习的计算和通信

现代深度学习工作负载运行在分布式硬件上并且难以优化-数据,模型和管道并行性要求开发人员周密地围绕cuBLAS和NCCL等库中的优化计算和通信内核重组其工作负载。计算和通信之间的逻辑分离使表的性能失去了跨越抽象边界的优化机会。为了探索这些机会,本文提出了CoCoNet,它由一种用于表达具有计算和通信功能的程序的计算语言,一种将转换应用于此类程序的调度语言以及一个生成高性能内核的编译器组成。同时提供计算和通讯作为一流的构造,可以进行新的优化,例如通信与计算的重叠或融合。CoCoNet使我们能够以很少的代码行来优化现有深度学习系统中的多个数据,模型和管道并行工作负载。在集成新颖的CoCoNet生成的内核后,我们显示出显着的改进。
更新日期:2021-05-13
down
wechat
bug