当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ESetStore: An Erasure-Coded Storage System With Fast Data Recovery
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-03-31 , DOI: 10.1109/tpds.2020.2983411
Chengjian Liu , Qiang Wang , Xiaowen Chu , Yiu-Wing Leung , Hai Liu

Erasure codes have been used extensively in large-scale storage systems to reduce the storage overhead of triplication-based storage systems. One key performance issue introduced by erasure codes is the long time needed to recover from a single failure, which occurs constantly in large-scale storage systems. We present ESetStore, a prototype erasure-coded storage system that aims to achieve fast recovery from failures. ESetStore is novel in the following aspects. We proposed a data placement algorithm named ESet for our ESetStore that can aggregate adequate I/O resources from available storage servers to recover from each single failure. We designed and implemented efficient read and write operations on our erasure-coded storage system via effective use of available I/O and computation resources. We evaluated the performance of ESetStore with extensive experiments on a cluster with 50 storage servers. The evaluation results demonstrate that our recovery performance can obtain linear performance growth by harvesting available I/O resources. With our defined parameter recovery I/O parallelism under some mild conditions, we can achieve optimal recovery performance, in which ESet enables minimal recovery time. Rather than being an alternative to improve recovery performance, our work can be an enhancement for existing solutions, such as Partial-parallel-repair (PPR), to further improve recovery performance.

中文翻译:


ESetStore:具有快速数据恢复功能的纠删码存储系统



纠删码已广泛应用于大规模存储系统中,以减少基于三重存储系统的存储开销。纠删码引入的一个关键性能问题是从单个故障中恢复需要很长时间,这种情况在大规模存储系统中经常发生。我们推出了 ESetStore,这是一种原型纠删码存储系统,旨在实现从故障中快速恢复。 ESetStore的新颖之处在于以下几个方面。我们为 ESetStore 提出了一种名为 ESet 的数据放置算法,该算法可以从可用存储服务器聚合足够的 I/O 资源,以便从每次故障中恢复。我们通过有效利用可用的 I/O 和计算资源,在纠删码存储系统上设计并实现了高效的读写操作。我们在具有 50 台存储服务器的集群上进行了大量实验,评估了 ESetStore 的性能。评估结果表明,我们的恢复性能可以通过收获可用的 I/O 资源来获得线性性能增长。通过我们定义的参数恢复 I/O 并行性,在一些温和的条件下,我们可以实现最佳的恢复性能,其中 ESet 可以实现最短的恢复时间。我们的工作不是提高恢复性能的替代方案,而是对现有解决方案(例如部分并行修复(PPR))的增强,以进一步提高恢复性能。
更新日期:2020-03-31
down
wechat
bug