OPTWEB: A Lightweight Fully Connected Inter-FPGA Network for Efficient Collectives,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

OPTWEB: A Lightweight Fully Connected Inter-FPGA Network for Efficient Collectives
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2021-03-24 , DOI: 10.1109/tc.2021.3068715
Kenji Mizutani , Hiroshi Yamaguchi , Yutaka Urino , Michihiro Koibuchi

Modern FPGA accelerators can be equipped with many high-bandwidth network I/Os, e.g., 64 x 50 Gbps, enabled by onboard optics or co-packaged optics. Some dozens of tightly coupled FPGA accelerators form an emerging computing platform for distributed data processing. However, a conventional indirect packet network using Ethernet's Intellectual Properties imposes an unacceptably large amount of the logic for handling such high-bandwidth interconnects on an FPGA. Besides the indirect network, another approach builds a direct packet network. Existing direct inter-FPGA networks have a low-radix network topology, e.g., 2-D torus. However, the low-radix network has the disadvantage of a large diameter and large average shortest path length that increases the latency of collectives. To mitigate both problems, we propose a lightweight, fully connected inter-FPGA network called OPTWEB for efficient collectives. Since all end-to-end separate communication paths are statically established using onboard optics, raw block data can be transferred with simple link-level synchronization. Once each source FPGA assigns a communication stream to a path by its internal switch logic between memory-mapped and stream interfaces for remote direct memory access (RDMA), a one-hop transfer is provided. Since each FPGA performs input/output of the remote memory access between all FPGAs simultaneously, multiple RDMAs efficiently form collectives. The OPTWEB network provides 0.71-μsec start-up latency of collectives among multiple Intel Stratix 10 MX FPGA cards with onboard optics. The OPTWEB network consumes 31.4 and 57.7 percent of adaptive logic modules for aggregate 400-Gbps and 800-Gbps interconnects on a custom Stratix 10 MX 2100 FPGA, respectively. The OPTWEB network reduces by 40 percent the cost compared to a conventional packet network.

中文翻译：

OPTWEB：面向高效集体的轻量级全连接FPGA间网络

现代FPGA加速器可配备许多高带宽网络I / O，例如64 x 50 Gbps，可通过板载光学器件或共封装光学器件实现。数十个紧密耦合的FPGA加速器形成了用于分布式数据处理的新兴计算平台。然而，使用以太网的知识产权的常规间接分组网络在FPGA上施加了无法接受的大量逻辑来处理这种高带宽互连。除了间接网络，另一种方法是建立直接分组网络。现有的直接FPGA间直接网络具有低基数的网络拓扑，例如2-D圆环面。但是，低基数网络的缺点是直径较大且平均最短路径长度较大，这会增加集合体的等待时间。为了缓解这两个问题，我们建议采用轻量级的产品，称为OPTWEB的完全连接的FPGA间网络，用于高效的集合体。由于使用板载光学器件静态建立了所有端到端的单独通信路径，因此可以通过简单的链路级同步来传输原始块数据。一旦每个源FPGA通过其内存映射和流接口之间的内部切换逻辑将通信流分配给路径以进行远程直接存储器访问（RDMA），便提供了单跳传输。由于每个FPGA同时执行所有FPGA之间的远程存储器访问的输入/输出，因此多个RDMA有效地形成了集合。OPTWEB网络在带有板载光学器件的多个Intel Stratix 10 MX FPGA卡之间提供了0.71微秒的集合启动延迟。OPTWEB网络消耗31.4和57。定制Stratix 10 MX 2100 FPGA上分别有7％的自适应逻辑模块可用于聚合400-Gbps和800-Gbps互连。与传统的分组网络相比，OPTWEB网络将成本降低了40％。

更新日期：2021-05-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11