当前位置: X-MOL 学术Acta Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Paxos based algorithm to minimize the overhead of process recovery in consensus
Acta Informatica ( IF 0.4 ) Pub Date : 2019-04-27 , DOI: 10.1007/s00236-019-00334-w
Sathyanarayanan Srinivasan , Ramesh Kandukoori

Consensus is a fundamental abstraction in distributed systems and its solvability is widely discussed in the literature. In message passing distributed systems where there is a need to solve sequential instances of consensus, it is possible that some processes become faulty during one instance and recover later in another instance. Though consensus algorithms should be equipped both to handle process failures and process recovery, only a little amount of work has been done in the literature to handle process recovery. Handling process recovery is not trivial because a recovered process may broadcast a new message which could hamper the progress made by other processes towards achieving consensus in their current round, and thereby forcing them to start a new round. Therefore algorithms that are not designed to handle process recovery require $${\text {O}}\bigl (f\bigr )$$O(f) rounds or $${\text {O}}\bigl (f\delta \bigr )$$O(fδ) time to achieve consensus, where at most f processes can recover and $$\delta $$δ is the message delay in the system. But Dutta et al. (in: International conference on dependable systems and networks (DSN’05), pp 22–27), 2005. https://doi.org/10.1109/DSN.2005.54) showed that the overhead of handling process recovery is constant and their algorithm takes $$17\delta $$17δ time to achieve consensus. In this work, we introduce a new Paxos based algorithm that lowers the upper bound to $$11\delta $$11δ. We also show that if all process failures are initial, the upper bound can be further reduced to $$5\delta $$5δ. Our algorithm selectively enables processes executing lower rounds to decide irrespective of the presence of higher rounds in the system, minimizing the effect of recovered processes starting a higher round.

中文翻译:

一种基于 Paxos 的算法,用于最小化共识中进程恢复的开销

共识是分布式系统中的基本抽象,其可解性在文献中被广泛讨论。在需要解决共识顺序实例的消息传递分布式系统中,某些进程可能会在一个实例中出现故障,然后在另一个实例中恢复。尽管应该配备共识算法来处理进程故障和进程恢复,但在文献中只做了很少的工作来处理进程恢复。处理进程恢复并非易事,因为恢复的进程可能会广播一条新消息,这可能会妨碍其他进程在当前轮次达成共识方面取得的进展,从而迫使它们开始新一轮。因此,不是为处理进程恢复而设计的算法需要 $${\text {O}}\bigl (f\bigr )$$O(f) 轮或 $${\text {O}}\bigl (f\delta \bigr )$$O(fδ) 达成共识的时间,其中最多 f 个进程可以恢复,$$\delta $$δ 是系统中的消息延迟。但是 Dutta 等人。(在:可靠系统和网络国际会议 (DSN'05),第 22-27 页),2005 年。https://doi.org/10.1109/DSN.2005.54)表明处理进程恢复的开销是恒定的,并且它们的算法需要 $$17\delta $$17δ 时间来达成共识。在这项工作中,我们引入了一种新的基于 Paxos 的算法,将上限降低到 $$11\delta $$11δ。我们还表明,如果所有过程失败都是初始的,则上限可以进一步降低到 $$5\delta $$5δ。
更新日期:2019-04-27
down
wechat
bug