Scalable and Adaptive Data Replica Placement for Geo-Distributed Cloud Storages,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable and Adaptive Data Replica Placement for Geo-Distributed Cloud Storages
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-01-21 , DOI: 10.1109/tpds.2020.2968321
Kaiyang Liu , Jun Peng , Jingrong Wang , Weirong Liu , Zhiwu Huang , Jianping Pan

In geo-distributed cloud storage systems, data replication has been widely used to serve the ever more users around the world for high data reliability and availability. How to optimize the data replica placement has become one of the fundamental problems to reduce the inter-node traffic and the system overhead of accessing associated data items. In the big data era, traditional solutions may face the challenges of long running time and large overheads to handle the increasing scale of data items with time-varying user requests. Therefore, novel offline community discovery and online community adjustment schemes are proposed to solve the replica placement problem in a scalable and adaptive way. The offline scheme can find a replica placement solution based on the average read/write rates for a certain period of time. The scalability can be achieved as 1) the computation complexity is linear to the amount of data items and 2) the data-node communities can evolve in parallel for a distributed replica placement. Furthermore, the online scheme is adaptive to handle the bursty data requests, without the need to completely override the existing replica placement. Driven by real-world data traces, extensive performance evaluations demonstrate the effectiveness of our design to handle large-scale datasets.

中文翻译：

地理分布式云存储的可扩展和自适应数据副本放置

在地理分布式云存储系统中，数据复制已被广泛使用，为全球越来越多的用户提供高数据可靠性和可用性。如何优化数据副本放置已成为减少节点间流量和访问关联数据项的系统开销的根本问题之一。在大数据时代，传统的解决方案可能面临运行时间长和开销大的挑战，以处理随时间变化的用户请求而不断增加的数据项规模。因此，提出了新颖的离线社区发现和在线社区调整方案，以可扩展和自适应的方式解决副本放置问题。离线方案可以根据一段时间内的平均读写速率找到副本放置方案。可扩展性可以通过以下方式实现：1）计算复杂性与数据项的数量成线性关系；2）数据节点社区可以并行发展以实现分布式副本放置。此外，在线方案能够自适应地处理突发数据请求，而不需要完全覆盖现有的副本放置。在真实世界数据跟踪的驱动下，广泛的性能评估证明了我们的设计处理大规模数据集的有效性。

更新日期：2020-01-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11