当前位置: X-MOL 学术ACM J. Emerg. Technol. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lotus
ACM Journal on Emerging Technologies in Computing Systems ( IF 2.1 ) Pub Date : 2020-09-17 , DOI: 10.1145/3415749
Yunfeng Lu 1 , Huaxi Gu 1 , Xiaoshan Yu 1 , Krishnendu Chakrabarty 2
Affiliation  

Machine learning is at the heart of many services provided by data centers. To improve the performance of machine learning, several parameter (gradient) synchronization methods have been proposed in the literature. These synchronization algorithms have different communication characteristics and accordingly place different demands on the network architecture. However, traditional data-center networks cannot easily meet these demands. Therefore, we analyze the communication profiles associated with several common synchronization algorithms and propose a machine learning--oriented network architecture to match their characteristics. The proposed design, named Lotus, because it looks like a lotus flower, is a hybrid optical/electrical architecture based on arrayed waveguide grating routers (AWGRs). In Lotus, a complete bipartite graph is used within the group to improve bisection bandwidth and scalability. Each pair of groups is connected by an optical link, and AWGRs between adjacent groups enhance path diversity and network reliability. We also present an efficient routing algorithm to make full use of the path diversity of Lotus, which leads to a further increase in network performance. Simulation results show that the network performance of Lotus is better than Dragonfly and 3D-Torus under realistic traffic patterns for different synchronization algorithms.

中文翻译:

莲花

机器学习是数据中心提供的许多服务的核心。为了提高机器学习的性能,文献中提出了几种参数(梯度)同步方法。这些同步算法具有不同的通信特性,因此对网络架构提出了不同的要求。然而,传统的数据中心网络无法轻易满足这些需求。因此,我们分析了与几种常见同步算法相关的通信配置文件,并提出了一种面向机器学习的网络架构来匹配它们的特性。提议的设计名为 Lotus,因为它看起来像一朵莲花,是一种基于阵列波导光栅路由器 (AWGR) 的混合光电架构。在莲花,组内使用完整的二分图来提高二分带宽和可扩展性。每对组通过光链路连接,相邻组之间的 AWGR 增强了路径多样性和网络可靠性。我们还提出了一种有效的路由算法,以充分利用 Lotus 的路径多样性,从而进一步提高网络性能。仿真结果表明,在不同同步算法的真实流量模式下,Lotus 的网络性能优于 Dragonfly 和 3D-Torus。从而进一步提高网络性能。仿真结果表明,在不同同步算法的真实流量模式下,Lotus 的网络性能优于 Dragonfly 和 3D-Torus。从而进一步提高网络性能。仿真结果表明,在不同同步算法的真实流量模式下,Lotus 的网络性能优于 Dragonfly 和 3D-Torus。
更新日期:2020-09-17
down
wechat
bug