当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CURE: A High-Performance, Low-Power, and Reliable Network-on-Chip Design Using Reinforcement Learning
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2020-09-01 , DOI: 10.1109/tpds.2020.2986297
Ke Wang , Ahmed Louri

We propose CURE, a deep reinforcement learning (DRL)-based NoC design framework that simultaneously reduces network latency, improves energy-efficiency, and tolerates transient errors and permanent faults. CURE has several architectural innovations and a DRL-based hardware controller to manage design complexity and optimize trade-offs. First, in CURE, we propose reversible multi-function adaptive channels (RMCs) to reduce NoC power consumption and network latency. Second, we implement a new fault-secure adaptive error correction hardware in each router to enhance reliability for both transient errors and permanent faults. Third, we propose a router power-gating and bypass design that powers off NoC components to reduce power and extend chip lifespan. Further, for the complex dynamic interactions of these techniques, we propose using DRL to train a proactive control policy to provide improved fault-tolerance, reduced power consumption, and improved performance. Simulation using the PARSEC benchmark shows that CURE reduces end-to-end packet latency by 39 percent, improves energy efficiency by 92 percent, and lowers static and dynamic power consumption by 24 and 38 percent, respectively, over conventional solutions. Using mean-time-to-failure, we show that CURE is 7.7× more reliable than the conventional NoC design.

中文翻译:

CURE:使用强化学习的高性能、低功耗和可靠的片上网络设计

我们提出了 CURE,这是一种基于深度强化学习 (DRL) 的 NoC 设计框架,可同时减少网络延迟、提高能源效率并容忍瞬时错误和永久性故障。CURE 拥有多项架构创新和基于 DRL 的硬件控制器来管理设计复杂性和优化权衡。首先,在 CURE 中,我们提出了可逆多功能自适应通道 (RMC) 以减少 NoC 功耗和网络延迟。其次,我们在每个路由器中实施了一个新的故障安全自适应纠错硬件,以提高瞬时错误和永久性故障的可靠性。第三,我们提出了一种路由器电源门控和旁路设计,可关闭 NoC 组件以降低功耗并延长芯片寿命。此外,对于这些技术的复杂动态交互,我们建议使用 DRL 来训练主动控制策略,以提供改进的容错性、降低的功耗和改进的性能。使用 PARSEC 基准进行的仿真表明,与传统解决方案相比,CURE 将端到端数据包延迟减少了 39%,将能源效率提高了 92%,并将静态和动态功耗分别降低了 24% 和 38%。使用平均故障时间,我们表明 CURE 的可靠性比传统 NoC 设计高 7.7 倍。优于传统解决方案。使用平均故障时间,我们表明 CURE 的可靠性比传统 NoC 设计高 7.7 倍。超过传统的解决方案。使用平均故障时间,我们表明 CURE 的可靠性比传统 NoC 设计高 7.7 倍。
更新日期:2020-09-01
down
wechat
bug