当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed storage algorithms with optimal tradeoffs
arXiv - CS - Information Retrieval Pub Date : 2021-01-13 , DOI: arxiv-2101.05223
Michael Luby, Thomas Richardson

One of the primary objectives of a distributed storage system is to reliably store large amounts of source data for long durations using a large number $N$ of unreliable storage nodes, each with $c$ bits of storage capacity. Storage nodes fail randomly over time and are replaced with nodes of equal capacity initialized to zeroes, and thus bits are erased at some rate $e$. To maintain recoverability of the source data, a repairer continually reads data over a network from nodes at an average rate $r$, and generates and writes data to nodes based on the read data. The distributed storage source capacity is the maximum amount of source that can be reliably stored for long periods of time. Previous research shows that asymptotically the distributed storage source capacity is at most $\left(1-\frac{e}{2 \cdot r}\right) \cdot N \cdot c$ as $N$ and $r$ grow. In this work we introduce and analyze algorithms such that asymptotically the distributed storage source data capacity is at least the above equation. Thus, the above equation expresses a fundamental trade-off between network traffic and storage overhead to reliably store source data.

中文翻译:

具有最佳权衡的分布式存储算法

分布式存储系统的主要目标之一是使用大量的$ N $个不可靠的存储节点(每个存储元具有$ c $位)来可靠地长期存储大量源数据。存储节点随时间随机发生故障,并被初始化为零的等容量节点替换,因此位以某种速率$ e $被擦除。为了保持源数据的可恢复性,修复器以平均速率$ r $连续地通过网络从节点读取数据,并基于读取的数据生成数据并将其写入节点。分布式存储源容量是可以长时间可靠存储的最大源数量。先前的研究表明,随着$ N $和$ r $的增长,分布式存储源容量最多为$ \ left(1- \ frac {e} {2 \ cdot r} \ right)\ cdot N \ cdot c $。在这项工作中,我们介绍和分析算法,使得渐近分布的存储源数据容量至少是上述等式。因此,以上等式表达了网络流量和存储开销之间的基本权衡,以可靠地存储源数据。
更新日期:2021-01-14
down
wechat
bug