On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes,ACM Transactions on Storage

当前位置： X-MOL 学术 › ACM Trans. Storage › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes
ACM Transactions on Storage ( IF 1.7 ) Pub Date : 2020-05-25 , DOI: 10.1145/3381832
Oleg Kolosov ₁ , Gala Yadgar ₂ , Matan Liram ₂ , Itzhak Tamo ₁ , Alexander Barg ₃

Affiliation

Erasure codes in large-scale storage systems allow recovery of data from a failed node. A recently developed class of codes, locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate efficient recovery scenarios by adding parity blocks to the system. However, these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing LRCs differ in their use of the parity blocks, in their locality semantics, and in their parameter space. Thus, existing theoretical models cannot directly compare different LRCs to determine which code offers the best recovery performance, and at what cost. We perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure’s LRCs, and Optimal-LRCs in light of two new metrics: average degraded read cost and normalized repair cost. We show the tradeoff between these costs and the code’s fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster further demonstrates the different effects of realistic system bottlenecks on the benefit from each LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.

中文翻译：

局部可修复代码的容错性、局部性和最优性

大规模存储系统中的纠删码允许从故障节点恢复数据。最近开发的一类代码，本地可修复代码 (LRC)，提供了存储开销和修复成本之间的权衡。LRC 通过向系统添加奇偶校验块来促进有效的恢复方案。然而，这些额外的块最终可能会增加必须重建的块的数量。现有的 LRC 在奇偶校验块的使用、位置语义和参数空间方面有所不同。因此，现有的理论模型无法直接比较不同的 LRC 来确定哪些代码提供了最佳的恢复性能，以及成本是多少。我们对现有的 LRC 方法进行了第一次系统比较。我们根据两个新指标分析 Xorbas、Azure 的 LRC 和 Optimal-LRC：平均降级读取成本和标准化修复成本。我们展示了这些成本和代码容错之间的权衡，并且不同的方法在这种权衡中提供了不同的选择。我们对 Ceph 集群的实验评估进一步证明了现实系统瓶颈对每种 LRC 方法的收益的不同影响。尽管存在这些差异，但标准化维修成本指标可以可靠地识别在每个设置中实现最低维修成本的 LRC 方法。

更新日期：2020-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>