当前位置: X-MOL 学术IEEE Trans. Dependable Secure Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Rack-Aware Single Failure Recovery for Clustered File Systems
IEEE Transactions on Dependable and Secure Computing ( IF 7.0 ) Pub Date : 2020-03-01 , DOI: 10.1109/tdsc.2017.2774299
Zhirong Shen , Patrick P. C. Lee , Jiwu Shu , Wenzhong Guo

How to improve the performance of single failure recovery has been an active research topic because of its prevalence in large-scale storage systems. We argue that when erasure coding is deployed in a clustered file system (CFS), existing single failure recovery designs are limited in different aspects: neglecting the bandwidth diversity property in a CFS architecture, targeting specific erasure code constructions, and no special treatment on load balancing during recovery. In this paper, we propose CAR, a cross-rack-aware recovery algorithm that is designed to improve the performance of single failure recovery of a CFS that employs Reed-Solomon codes for general fault tolerance. For each stripe, CAR finds a recovery solution that retrieves data from the minimum number of racks. It also reduces the amount of cross-rack repair traffic by performing intra-rack data aggregation prior to cross-rack transmission. Furthermore, by considering multi-stripe recovery, CAR balances the amount of cross-rack repair traffic across multiple racks. Evaluation results show that CAR can effectively reduce the amount of cross-rack repair traffic and the resulting recovery time.

中文翻译:

集群文件系统的跨机架感知单一故障恢复

由于其在大规模存储系统中的普遍存在,如何提高单故障恢复的性能一直是一个活跃的研究课题。我们认为,当纠删码部署在集群文件系统 (CFS) 中时,现有的单故障恢复设计在不同方面受到限制:忽略 CFS 架构中的带宽多样性属性,针对特定的纠删码构造,对负载没有特殊处理恢复期间的平衡。在本文中,我们提出了 CAR,这是一种跨机架感知恢复算法,旨在提高使用 Reed-Solomon 代码进行一般容错的 CFS 的单故障恢复性能。对于每个条带,CAR 找到一个恢复解决方案,从最少数量的机架中检索数据。它还通过在跨机架传输之前执行机架内数据聚合来减少跨机架修复流量。此外,通过考虑多条带恢复,CAR 平衡了跨多个机架的跨机架修复流量。评估结果表明,CAR可以有效减少跨机架修复流量和由此产生的恢复时间。
更新日期:2020-03-01
down
wechat
bug