当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic replication factor model for Linux containers-based cloud systems
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2020-01-16 , DOI: 10.1007/s11227-020-03158-5
Heithem Abbes , Thouraya Louati , Christophe Cérin

Infrastructure-as-a-service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of cloud architectures, faults are becoming a frequent occurrence, which makes availability true challenge. Replication is a method to survive failures whether of checkpoints, containers or data to increase their availability. In fact, following a node failure, fault-tolerant cloud systems restart failed containers on a new node from distributed images of containers (or checkpoints). With a high failure rate, we can lose some replicas. It is interesting to increase the replication factor in some cases and finding the trade-off between restarting all failed containers and storage overhead. This paper addresses the issue of adapting the replication factor and contributes with a novel replication factor modeling approach, which is able to predict the right replication factor using prediction techniques. These techniques are based on experimental modeling, which analyze collected data related to different executions. We have used regression technique to find the relation between availability and replicas number. Experiments on the Grid’5000 testbed demonstrate the benefits of our proposal to satisfy the availability requirement, using a real fault-tolerant cloud system.

中文翻译:

基于 Linux 容器的云系统的动态复制因子模型

基础设施即服务基于容器的虚拟化作为运行分布式应用程序的平台越来越受到关注。随着云架构规模的不断扩大,故障频发,可用性成为真正的挑战。复制是一种在检查点、容器或数据的故障中幸存下来以提高其可用性的方法。事实上,在节点发生故障后,容错云系统会根据容器(或检查点)的分布式映像在新节点上重新启动发生故障的容器。由于故障率高,我们可能会丢失一些副本。在某些情况下增加复制因子并找到重新启动所有失败的容器和存储开销之间的权衡是很有趣的。本文解决了调整复制因子的问题,并提出了一种新颖的复制因子建模方法,该方法能够使用预测技术预测正确的复制因子。这些技术基于实验建模,可分析与不同执行相关的收集数据。我们已经使用回归技术来找到可用性和副本数量之间的关系。在 Grid'5000 测试台上的实验证明了我们的建议在满足可用性要求方面的好处,使用真正的容错云系统。我们已经使用回归技术来找到可用性和副本数量之间的关系。在 Grid'5000 测试台上的实验证明了我们的建议在满足可用性要求方面的好处,使用真正的容错云系统。我们已经使用回归技术来找到可用性和副本数量之间的关系。在 Grid'5000 测试台上的实验证明了我们的建议在满足可用性要求方面的好处,使用真正的容错云系统。
更新日期:2020-01-16
down
wechat
bug