当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An adaptive failure recovery mechanism based on asymmetric routing for data center networks
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-06-03 , DOI: 10.1007/s11227-020-03337-4
Yong Liu , Huaxi Gu , Kun Wang , Xiaoshan Yu , Yunhao Wang

As the infrastructure of high-performance computing, the data center network plays an important role. As network failures occur frequently, data center networks demand highly performed, robust, and energy-efficient failure recovery mechanisms. Despite process, the existing work still has a huge scope to improve to satisfy these requirements. The backup-based failure recovery schemes reserve backup paths in advance, which results in a large energy consumption under normal network conditions. In order to solve the energy consumption problem, the existing adaptive failure recovery schemes are proposed to calculate the rerouting path of the traffic on the failed link, which reduces the energy consumption. However, most adaptive fault recovery solutions apply multi-path routing to calculate the re-routing path. As multi-path routing cannot detect the congestion status of the path under the asymmetric topology caused by link failures, the network is congested, which ends up in less robustness of the network. In view of this, we design and evaluate AFRM, a novel adaptive failure recovery mechanism that overcomes these challenges. AFRM uses asymmetrical routing to calculate the re-routing path by being congestion-aware and is more robust to topological asymmetries compared with existing schemes. The asymmetrical routing dynamically schedules flows to the path with the least marginal cost, which makes AFRM much more energy-efficient. Additionally, AFRM achieves fast link failure detection based on hash storage and flow table matching. Evaluations show that AFRM can do the trade-off between failure recovery time and energy consumption, reduce flow completion time, and increase network throughput compared with existing schemes.

中文翻译:

基于非对称路由的数据中心网络自适应故障恢复机制

数据中心网络作为高性能计算的基础设施,发挥着重要的作用。由于网络故障频繁发生,数据中心网络需要高性能、健壮且节能的故障恢复机制。尽管有过程,现有的工作仍然有很大的改进空间来满足这些要求。基于备份的故障恢复方案预先预留备份路径,导致在正常网络条件下能耗较大。为了解决能耗问题,现有的自适应故障恢复方案被提出来计算故障链路上流量的重路由路径,从而降低能耗。然而,大多数自适应故障恢复解决方案应用多路径路由来计算重路由路径。由于链路故障导致的非对称拓扑结构下,多路径路由无法检测路径的拥塞状态,从而导致网络拥塞,从而导致网络的鲁棒性降低。鉴于此,我们设计并评估了 AFRM,这是一种克服这些挑战的新型自适应故障恢复机制。AFRM 使用非对称路由通过拥塞感知来计算重新路由路径,并且与现有方案相比,对拓扑不对称具有更强的鲁棒性。非对称路由以最小的边际成本动态地将流量调度到路径,这使得 AFRM 更加节能。此外,AFRM 实现了基于哈希存储和流表匹配的快速链路故障检测。评估表明,AFRM 可以在故障恢复时间和能耗之间进行权衡,
更新日期:2020-06-03
down
wechat
bug