当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2020-09-01 , DOI: 10.1109/tvlsi.2020.3008185
Shanlin Xiao , Yuhao Guo , Wenkang Liao , Huipeng Deng , Yi Luo , Huanliang Zheng , Jian Wang , Cheng Li , Gezi Li , Zhiyi Yu

Large-scale neural network (NN) accelerators typically consist of several processing nodes, which could be implemented as a multi- or many-core chip and organized via a network-on-chip (NoC) to handle the heavy neuron-to-neuron traffic. Multiple NoC-based NN chips are connected through chip-to-chip interconnection networks to further boost the overall neural acceleration capability. Huge amounts of multicast-based traffic travel on-chip or cross chips, making the interconnection network design more challenging and become the bottleneck of the NN system performance and energy. In this article, we propose coupling intrachip and interchip communication techniques, called NeuronLink, for NN accelerators. Regarding the intrachip communication, we propose scoring crossbar arbitration, arbitration interception, and route computation parallelization techniques for virtual-channel routing, leading to a high-throughput NoC with a lower hardware cost for multicast-based traffic. Regarding the interchip communication, we propose a lightweight and NoC-aware chip-to-chip interconnection scheme, enabling efficient interconnection for NoC-based NN chips. In addition, we evaluate the proposed techniques on a four connected NoC-based deep neural network (DNN) chips with four field-programmable gate arrays (FPGAs). The experimental results show that the proposed interconnection network can efficiently manage the data traffic inside DNNs with high-throughput and low-overhead against state-of-the-art interconnects.

中文翻译:

NeuronLink:用于大规模神经网络加速器的高效芯片到芯片互连

大规模神经网络 (NN) 加速器通常由多个处理节点组成,这些节点可以实现为多核或众核芯片,并通过片上网络 (NoC) 进行组织以处理繁重的神经元到神经元交通。多个基于NoC的神经网络芯片通过芯片间互联网络连接,进一步提升整体神经加速能力。大量基于组播的流量在片上或跨片传输,使得互连网络设计更具挑战性,成为神经网络系统性能和能耗的瓶颈。在本文中,我们为 NN 加速器提出了耦合芯片内和芯片间通信技术,称为 NeuronLink。对于片内通信,我们提出了评分交叉仲裁、仲裁拦截、以及用于虚拟通道路由的路由计算并行化技术,从而为基于多播的流量提供具有较低硬件成本的高吞吐量 NoC。在芯片间通信方面,我们提出了一种轻量级和 NoC 感知的芯片到芯片互连方案,使基于 NoC 的神经网络芯片实现高效互连。此外,我们在具有四个现场可编程门阵列 (FPGA) 的四个连接的基于 NoC 的深度神经网络 (DNN) 芯片上评估了所提出的技术。实验结果表明,所提出的互连网络可以以高吞吐量和低开销有效地管理 DNN 内部的数据流量,以对抗最先进的互连。在芯片间通信方面,我们提出了一种轻量级和 NoC 感知的芯片到芯片互连方案,使基于 NoC 的神经网络芯片实现高效互连。此外,我们在具有四个现场可编程门阵列 (FPGA) 的四个连接的基于 NoC 的深度神经网络 (DNN) 芯片上评估了所提出的技术。实验结果表明,所提出的互连网络可以以高吞吐量和低开销有效地管理 DNN 内部的数据流量,以对抗最先进的互连。在芯片间通信方面,我们提出了一种轻量级和 NoC 感知的芯片到芯片互连方案,使基于 NoC 的神经网络芯片实现高效互连。此外,我们在具有四个现场可编程门阵列 (FPGA) 的四个连接的基于 NoC 的深度神经网络 (DNN) 芯片上评估了所提出的技术。实验结果表明,所提出的互连网络可以以高吞吐量和低开销有效地管理 DNN 内部的数据流量,以对抗最先进的互连。
更新日期:2020-09-01
down
wechat
bug