当前位置:
X-MOL 学术
›
arXiv.cs.PL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
arXiv - CS - Programming Languages Pub Date : 2021-05-12 , DOI: arxiv-2105.05720 Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Madanlal Musuvathi, Olli Sarikivi, Todd Mytkowicz, Youshan Miao
arXiv - CS - Programming Languages Pub Date : 2021-05-12 , DOI: arxiv-2105.05720 Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Madanlal Musuvathi, Olli Sarikivi, Todd Mytkowicz, Youshan Miao
Modern deep learning workloads run on distributed hardware and are difficult
to optimize -- data, model, and pipeline parallelism require a developer to
thoughtfully restructure their workload around optimized computation and
communication kernels in libraries such as cuBLAS and NCCL. The logical
separation between computation and communication leaves performance on the
table with missed optimization opportunities across abstraction boundaries. To
explore these opportunities, this paper presents CoCoNet, which consists of a
compute language to express programs with both computation and communication, a
scheduling language to apply transformations on such programs, and a compiler
to generate high performance kernels. Providing both computation and
communication as first class constructs enables new optimizations, such as
overlapping or fusion of communication with computation. CoCoNet allowed us to
optimize several data, model and pipeline parallel workloads in existing deep
learning systems with very few lines of code. We show significant improvements
after integrating novel CoCoNet generated kernels.
中文翻译:
CoCoNet:共同优化分布式机器学习的计算和通信
现代深度学习工作负载运行在分布式硬件上并且难以优化-数据,模型和管道并行性要求开发人员周密地围绕cuBLAS和NCCL等库中的优化计算和通信内核重组其工作负载。计算和通信之间的逻辑分离使表的性能失去了跨越抽象边界的优化机会。为了探索这些机会,本文提出了CoCoNet,它由一种用于表达具有计算和通信功能的程序的计算语言,一种将转换应用于此类程序的调度语言以及一个生成高性能内核的编译器组成。同时提供计算和通讯作为一流的构造,可以进行新的优化,例如通信与计算的重叠或融合。CoCoNet使我们能够以很少的代码行来优化现有深度学习系统中的多个数据,模型和管道并行工作负载。在集成新颖的CoCoNet生成的内核后,我们显示出显着的改进。
更新日期:2021-05-13
中文翻译:
CoCoNet:共同优化分布式机器学习的计算和通信
现代深度学习工作负载运行在分布式硬件上并且难以优化-数据,模型和管道并行性要求开发人员周密地围绕cuBLAS和NCCL等库中的优化计算和通信内核重组其工作负载。计算和通信之间的逻辑分离使表的性能失去了跨越抽象边界的优化机会。为了探索这些机会,本文提出了CoCoNet,它由一种用于表达具有计算和通信功能的程序的计算语言,一种将转换应用于此类程序的调度语言以及一个生成高性能内核的编译器组成。同时提供计算和通讯作为一流的构造,可以进行新的优化,例如通信与计算的重叠或融合。CoCoNet使我们能够以很少的代码行来优化现有深度学习系统中的多个数据,模型和管道并行工作负载。在集成新颖的CoCoNet生成的内核后,我们显示出显着的改进。