当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Byzantine fault tolerance in multi-master Kubernetes clusters
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-04-07 , DOI: 10.1016/j.future.2020.03.060
Gor Mack Diouf , Halima Elbiaze , Wael Jaafar

Docker container virtualization technology is being widely adopted in cloud computing environments because of its lightweight and efficiency. However, it requires adequate control and management via an orchestrator. As a result, cloud providers are adopting the open-access Kubernetes platform as the standard orchestrator of containerized applications. To ensure applications’ availability in Kubernetes, the latter uses Raft protocol’s replication mechanism. Despite its simplicity, Raft assumes that machines fail only when shutdown. This failure event is rarely the only reason for a machine’s malfunction. Indeed, software errors or malicious attacks can cause machines to exhibit Byzantine (i.e. random) behavior and thereby corrupt the accuracy and availability of the replication protocol. In this paper, we propose a Kubernetes multi-Master Robust (KmMR) platform to overcome this limitation. KmMR is based on the adaptation and integration of the BFT-SMaRt fault-tolerant replication protocol into Kubernetes environment. Unlike Raft protocol, BFT-SMaRt is resistant to both Byzantine and non-Byzantine faults. Experimental results show that KmMR is able to guarantee the continuity of services, even when the total number of tolerated faults is exceeded. In addition, KmMR provides on average a consensus time 1000 times shorter than that achieved by the conventional platform (with Raft), in such condition. Finally, we show that KmMR generates a small additional cost in terms of resource consumption compared to the conventional platform.



中文翻译:

多主Kubernetes集群中的拜占庭容错

Docker容器虚拟化技术因其轻量级和高效性而在云计算环境中被广泛采用。但是,它需要通过协调器进行适当的控制和管理。结果,云提供商将采用开放式Kubernetes平台作为容器化应用程序的标准协调器。为了确保Kubernetes中应用程序的可用性,后者使用Raft协议的复制机制。尽管它很简单,但Raft假定机器仅在关机时才会发生故障。此故障事件很少是机器故障的唯一原因。实际上,软件错误或恶意攻击可能导致机器表现出拜占庭(即随机)行为,从而破坏复制协议的准确性和可用性。在本文中,我们提出了一个Kubernetes多主健壮(KmMR)平台来克服此限制。KmMR基于BFT-SMaRt容错复制协议的改编和集成到Kubernetes环境中。与Raft协议不同,BFT-SMaRt可以抵抗拜占庭和非拜占庭的故障。实验结果表明,即使超过允许的故障总数,KmMR仍能够保证服务的连续性。此外,在这种情况下,KmMR平均提供的共识时间比传统平台(使用Raft)所实现的共识时间短1000倍。最后,我们证明,与传统平台相比,KmMR在资源消耗方面产生了很小的额外成本。KmMR基于BFT-SMaRt容错复制协议的改编和集成到Kubernetes环境中。与Raft协议不同,BFT-SMaRt可以抵抗拜占庭和非拜占庭的故障。实验结果表明,即使超过允许的故障总数,KmMR仍能够保证服务的连续性。此外,在这种情况下,KmMR平均提供的共识时间比传统平台(使用Raft)所实现的共识时间短1000倍。最后,我们证明,与传统平台相比,KmMR在资源消耗方面产生了很小的额外成本。KmMR基于BFT-SMaRt容错复制协议的改编和集成到Kubernetes环境中。与Raft协议不同,BFT-SMaRt可以抵抗拜占庭和非拜占庭的故障。实验结果表明,即使超过允许的故障总数,KmMR仍能够保证服务的连续性。此外,在这种情况下,KmMR平均提供的共识时间比传统平台(使用Raft)所实现的共识时间短1000倍。最后,我们证明,与传统平台相比,KmMR在资源消耗方面产生了很小的额外成本。实验结果表明,即使超过允许的故障总数,KmMR仍能够保证服务的连续性。此外,在这种情况下,KmMR平均提供的共识时间比传统平台(使用Raft)所实现的共识时间短1000倍。最后,我们证明,与传统平台相比,KmMR在资源消耗方面产生了很小的额外成本。实验结果表明,即使超过允许的故障总数,KmMR仍能够保证服务的连续性。此外,在这种情况下,KmMR的平均共识时间比传统平台(使用Raft)所实现的共识时间短1000倍。最后,我们证明,与传统平台相比,KmMR在资源消耗方面产生了很小的额外成本。

更新日期:2020-04-07
down
wechat
bug