当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2019-09-07 , DOI: 10.1007/s11265-019-01476-3
Yung-Chang Chang , Cihun-Siyong Alex Gong , Ching-Te Chiu

The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.



中文翻译:

具有路由器级冗余的基于容错网状的NoC

大规模扩展的CMOS技术正日益威胁片上网络(NoC)体系结构的可靠性。在基于网格的NoC中,有故障的路由器或断开的链路可能会隔离功能良好的处理元件(PE)。同样,一组故障路由器可能会形成隔离区域,这可能会使设计质量下降。在本文中,我们提出了一种路由器级冗余(RLR)容错方案,该方案不同于传统的微体系结构级冗余(MLR)方法,以缓解隔离的PE和隔离区域的问题。通过在网格中的每个路由器集中简单地添加一个备用路由器,可以创建RLR,并且可以使相邻路由器之间的连接路径多样化。为了利用此额外资源,演示了两种重新配置算法来绕过观察到的故障路由器/链路。提出的RLR容错方案最多可以容忍一个路由器集中的一个故障路由器。重新配置后,将保留原始的网格拓扑。结果,所提出的体系结构不需要网络层路由算法的任何支持。该方案已基于三个容错指标进行了评估:可靠性,平均故障时间(MTTF)和良率。实验结果表明,随着NoC尺寸的增加,RLR的性能也随之提高。但是,相对连接成本同时降低。这一特性使我们的架构适合于大规模NoC设计。所提出的体系结构不需要网络层路由算法的任何支持。该方案已基于三个容错指标进行了评估:可靠性,平均故障时间(MTTF)和良率。实验结果表明,随着NoC尺寸的增加,RLR的性能也随之提高。但是,相对连接成本同时降低。这一特性使我们的架构适合于大规模NoC设计。所提出的体系结构不需要网络层路由算法的任何支持。该方案已基于三个容错指标进行了评估:可靠性,平均故障时间(MTTF)和良率。实验结果表明,随着NoC尺寸的增加,RLR的性能也随之提高。但是,相对连接成本同时降低。这一特性使我们的架构适合于大规模NoC设计。

更新日期:2020-04-18
down
wechat
bug