Abstract
HopliteBuf is a deflection-free, low-cost, and high-speed FPGA overlay Network-on-chip (NoC) with stall-free buffers. It is an FPGA-friendly 2D unidirectional torus topology built on top of HopliteRT overlay NoC. The stall-free buffers in HopliteBuf are supported by static analysis tools based on network calculus that help determine worst-case FIFO occupancy bounds for a prescribed workload. We implement these FIFOs using cheap LUT SRAMs (Xilinx SRL32s and Intel MLABs) to reduce cost. HopliteBuf is a hybrid microarchitecture that combines the performance benefits of conventional buffered NoCs by using stall-free buffers with the cost advantages of deflection-routed NoCs by retaining the lightweight unidirectional torus topology structure. We present two design variants of the HopliteBuf NoC: (1) single corner-turn FIFO (W → S) and (2) dual corner-turn FIFO (W → S+N). The single corner-turn (W → S) design is simpler and only introduces a buffering requirement for packets changing dimension from the X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W → S) as well as uphill (W → N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of a small increase in resource cost. Our static analysis delivers bounds that are not only better (in latency) than HopliteRT but also tighter by 2−3×. Across 100 randomly generated flowsets mapped to a 5×5 system size, HopliteBuf is able to route a larger fraction of these flowsets with <128-deep FIFOs, boost worst-case routing latency by ≈ 2× for mutually feasible flowsets, and support a 10% higher injection rate than HopliteRT. At 20% injection rates, HopliteRT is only able to route 1--2% of the flowsets, while HopliteBuf can deliver 40--50% sustainability. When compared to the W → Sbkp backpressure-based router, we observe that our HopliteBuf solution offers 25--30% better feasibility at 30--40% lower LUT cost.
- Altera Corp. 2015. Arria 10 Core Fabric and General Purpose I/Os Handbook. Retrieved from https://www.altera.com/en_US/pdfs/literature/hb/arria-10/a10_handbook.pdf.Google Scholar
- Ahmed Amari and Ahlem Mifdaoui. 2017. Worst-case timing analysis of ring networks with cyclic dependencies using network calculus. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’17). IEEE.Google ScholarCross Ref
- Ken Chapman. 2008. Saving costs with the SRL16E. White Paper WP271 (v1. 0), Xilinx Inc (2008).Google Scholar
- Jan Gray. 2016. GRVI-Phalanx: A massively parallel RISC-V FPGA accelerator accelerator. In Proceedings of the 24th IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 17--20.Google ScholarCross Ref
- Yutian Huan and A. DeHon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In Field-Programmable Technology. 47--52.Google Scholar
- S. Jeon, J. Cho, Y. Jung, S. Park, and T. Han. 2011. Automotive hardware development according to ISO 26262. In 13th International Conference on Advanced Communication Technology (ICACT’11). 588--592.Google Scholar
- N. Kapre and J. Gray. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In Field Programmable Logic and Applications. 1--8. DOI:https://doi.org/10.1109/FPL.2015.7293956Google Scholar
- H. Kashif and H. Patel. 2014. Bounding buffer space requirements for real-time priority-aware networks. In 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC’14). 113--118.Google Scholar
- Hany Kashif and Hiren Patel. 2016. Buffer space allocation for real-time priority-aware networks. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’16). IEEE, 1--12.Google ScholarCross Ref
- John Kim. 2009. Low-cost router microarchitecture for on-chip networks. In 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42’09), David H. Albonesi, Margaret Martonosi, David I. August, and José F. Martínez (Eds.). ACM, 255--266. DOI:https://doi.org/10.1145/1669112.1669145Google ScholarDigital Library
- Jean-Yves Le Boudec and Patrick Thiran. 2001. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer-Verlag.Google ScholarCross Ref
- Michael K. Papamichael and James C. Hoe. 2012. CONNECT: Re-examining conventional wisdom for designing NoCs in the context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 37--46.Google Scholar
- Ian Swarbrick, Dinesh Gaitonde, Sagheer Ahmad, Brian Gaide, and Ygal Arbel. 2019. Network-on-chip programmable platform in VersalTM ACAP architecture. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, NY, 212--221. DOI:https://doi.org/10.1145/3289602.3293908Google ScholarDigital Library
- Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2017. HopliteRT: An efficient FPGA NoC for real-time applications. In 2017 International Conference on Field Programmable Technology (ICFPT’17). IEEE, 64--71.Google ScholarCross Ref
- Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. 2017. Worst Case Latency Analysis for Hoplite FPGA-based NoC. Retrieved from http://hdl.handle.net/10012/12600.Google Scholar
- Xilinx Inc. 2015. 7 Series FPGAs Configurable Logic Block User Guide. Retrieved from http://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf.Google Scholar
Index Terms
- HopliteBuf: Network Calculus-Based Design of FPGA NoCs with Provably Stall-Free FIFOs
Recommendations
HopliteBuf: FPGA NoCs with Provably Stall-Free FIFOs
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysDeflection-routed NoCs like Hoplite and HopliteRT take advantage of FPGA-specific features to deliver low-cost, high-frequency, FPGA-friendly communication networks. However, they suffer from long packet deflection penalties, low sustained throughputs, ...
FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysThe latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline ...
Quality of service provision in combined input and crosspoint queued switches without output queueing match
Packet switches play one of the most critical roles in supporting quality of service (QoS) in computer communication networks. Although the output queued (OQ) switch architecture can achieve optimal QoS performance, it is not a practical architecture ...
Comments