当前位置: X-MOL 学术ACM Trans. Reconfig. Technol. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HopliteBuf
ACM Transactions on Reconfigurable Technology and Systems ( IF 3.1 ) Pub Date : 2020-04-04 , DOI: 10.1145/3375899
Tushar Garg 1 , Saud Wasly 1 , Rodolfo Pellizzoni 1 , Nachiket Kapre 1
Affiliation  

HopliteBuf is a deflection-free, low-cost, and high-speed FPGA overlay Network-on-chip (NoC) with stall-free buffers. It is an FPGA-friendly 2D unidirectional torus topology built on top of HopliteRT overlay NoC. The stall-free buffers in HopliteBuf are supported by static analysis tools based on network calculus that help determine worst-case FIFO occupancy bounds for a prescribed workload. We implement these FIFOs using cheap LUT SRAMs (Xilinx SRL32s and Intel MLABs) to reduce cost. HopliteBuf is a hybrid microarchitecture that combines the performance benefits of conventional buffered NoCs by using stall-free buffers with the cost advantages of deflection-routed NoCs by retaining the lightweight unidirectional torus topology structure. We present two design variants of the HopliteBuf NoC: (1) single corner-turn FIFO ( WS ) and (2) dual corner-turn FIFO ( WS + N ). The single corner-turn ( WS ) design is simpler and only introduces a buffering requirement for packets changing dimension from the X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill ( WS ) as well as uphill ( WN ). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of a small increase in resource cost. Our static analysis delivers bounds that are not only better (in latency) than HopliteRT but also tighter by 2−3×. Across 100 randomly generated flowsets mapped to a 5×5 system size, HopliteBuf is able to route a larger fraction of these flowsets with <128-deep FIFOs, boost worst-case routing latency by ≈ 2× for mutually feasible flowsets, and support a 10% higher injection rate than HopliteRT. At 20% injection rates, HopliteRT is only able to route 1--2% of the flowsets, while HopliteBuf can deliver 40--50% sustainability. When compared to the WS bkp backpressure-based router, we observe that our HopliteBuf solution offers 25--30% better feasibility at 30--40% lower LUT cost.

中文翻译:

HopliteBuf

HopliteBuf 是一种无偏差、低成本和高速 FPGA 覆盖片上网络 (NoC),具有无停顿缓冲器。它是一种 FPGA 友好的 2D 单向环面拓扑,构建在 HopliteRT 覆盖 NoC 之上。HopliteBuf 中的无停顿缓冲区由基于网络演算的静态分析工具支持,这些工具有助于确定规定工作负载的最坏情况 FIFO 占用界限。我们使用廉价的 LUT SRAM(Xilinx SRL32 和英特尔 MLAB)来实现这些 FIFO,以降低成本。HopliteBuf 是一种混合微架构,它通过使用无停顿缓冲区将传统缓冲 NoC 的性能优势与通过保留轻量级单向环面拓扑结构的偏转路由 NoC 的成本优势相结合。我们提出了 HopliteBuf NoC 的两种设计变体:(1)单角转弯 FIFO(W小号) 和 (2) 双角转弯 FIFO (W小号+ñ)。单拐弯(W小号) 设计更简单,并且只对从 X 环到下坡 Y 环(或从西到南)改变尺寸的数据包引入了缓冲要求。双角转弯变体需要两个 FIFO 来将数据包转向下坡(W小号) 以及上坡 (Wñ)。双转角设计克服了与单转角设计相关的数学分析挑战,用于通信工作负载,流遍历路径之间具有循环依赖关系,但代价是资源成本略有增加。我们的静态分析提供的边界不仅比 HopliteRT 更好(延迟方面),而且更严格 2−3 倍。在映射到 5×5 系统大小的 100 个随机生成的 flowset 中,HopliteBuf 能够使用 <128 深的 FIFO 路由这些 flowset 中的大部分,对于相互可行的 flowset,将最坏情况下的路由延迟提高 ≈ 2 倍,并支持注射率比 HopliteRT 高 10%。在 20% 的注入率下,HopliteRT 只能路由 1--2% 的 flowset,而 HopliteBuf 可以提供 40--50% 的可持续性。当与W小号 bkp 基于背压的路由器,我们观察到我们的 HopliteBuf 解决方案在 LUT 成本降低 30--40% 的情况下提供了 25--30% 更好的可行性。
更新日期:2020-04-04
down
wechat
bug