当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Design of a Near-Ideal Fault-Tolerant Routing Algorithm for Network-on-Chip-Based Multicores
arXiv - CS - Hardware Architecture Pub Date : 2020-06-19 , DOI: arxiv-2006.11025
Costas Iordanou, Vassos Soteriou, Konstantinos Aisopos

With relentless CMOS technology downsizing Networks-on-Chips (NoCs) are inescapably experiencing escalating susceptibility to wearout and reduced reliability. While faults in processors and memories may be masked via redundancy, or mitigated via techniques such as task migration, NoCs are especially vulnerable to hardware faults as a single link breakdown may cause inter-tile communication to halt indefinitely, rendering the whole multicore chip inoperable. As such, NoCs impose the risk of becoming the pivotal point of failure in chip multicores that utilize them. Aiming towards seamless NoC operation in the presence of faulty links we propose Hermes, a near-ideal fault-tolerant routing algorithm that meets the objectives of exhibiting high levels of robustness, operating in a distributed mode, guaranteeing freedom from deadlocks, and evening-out traffic, among many. Hermes is a limited-overhead deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-throughput, while providing pre-reconfigured escape path selection in the vicinity of faults. Under such online mechanisms, Hermes's performance degrades gracefully with increasing faulty link counts, a crucially desirable response lacking in prior-art. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed such that packets being routed to physically isolated regions cause no network stagnation due to indefinite chained blockages starting at sub-network boundaries. An extensive experimental evaluation, including utilizing traffic workloads gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to $3\times$ when compared against the state-of-the-art. Further, hardware synthesis results prove Hermes's efficacy.

中文翻译:

一种近乎理想的基于片上网络的多核容错路由算法的设计

随着 CMOS 技术的不断缩小,片上网络 (NoC) 不可避免地会遇到越来越容易磨损和降低可靠性的问题。虽然处理器和内存中的故障可以通过冗余来掩盖,或者通过任务迁移等技术来缓解,但 NoC 特别容易受到硬件故障的影响,因为单个链路故障可能导致片间通信无限期停止,从而导致整个多核芯片无法运行。因此,NoC 会带来成为使用它们的芯片多核故障关键点的风险。为了在存在故障链接的情况下实现无缝 NoC 操作,我们提出了 Hermes,这是一种近乎理想的容错路由算法,满足展示高水平鲁棒性、在分布式模式下运行、保证免于死锁的目标,和晚上外出的交通,等等。Hermes 是一种有限开销无死锁混合路由算法,利用无故障路径上的负载平衡路由来维持高吞吐量,同时在故障附近提供预先重新配置的逃逸路径选择。在这样的在线机制下,Hermes 的性能随着故障链接数的增加而优雅地下降,这是现有技术中缺乏的一个非常理想的响应。此外,Hermes 在故障链接拓扑密集分布的情况下识别非通信网络分区,以便路由到物理隔离区域的数据包不会由于从子网边界开始的无限链式阻塞而导致网络停滞。广泛的实验评估,包括利用从全系统芯片多处理器模拟收集的流量工作负载,表明与最先进的技术相比,Hermes 将网络吞吐量提高了 3 倍。此外,硬件综合结果证明了 Hermes 的功效。
更新日期:2020-06-22
down
wechat
bug