A Paxos based algorithm to minimize the overhead of process recovery in consensus

Srinivasan, Sathyanarayanan; Kandukoori, Ramesh

doi:10.1007/s00236-019-00334-w

A Paxos based algorithm to minimize the overhead of process recovery in consensus

Original Article
Published: 27 April 2019

Volume 56, pages 433–446, (2019)
Cite this article

Acta Informatica Aims and scope Submit manuscript

308 Accesses
5 Citations
Explore all metrics

Abstract

Consensus is a fundamental abstraction in distributed systems and its solvability is widely discussed in the literature. In message passing distributed systems where there is a need to solve sequential instances of consensus, it is possible that some processes become faulty during one instance and recover later in another instance. Though consensus algorithms should be equipped both to handle process failures and process recovery, only a little amount of work has been done in the literature to handle process recovery. Handling process recovery is not trivial because a recovered process may broadcast a new message which could hamper the progress made by other processes towards achieving consensus in their current round, and thereby forcing them to start a new round. Therefore algorithms that are not designed to handle process recovery require \({\text {O}}\bigl (f\bigr )\) rounds or \({\text {O}}\bigl (f\delta \bigr )\) time to achieve consensus, where at most f processes can recover and \(\delta \) is the message delay in the system. But Dutta et al. (in: International conference on dependable systems and networks (DSN’05), pp 22–27), 2005. https://doi.org/10.1109/DSN.2005.54) showed that the overhead of handling process recovery is constant and their algorithm takes \(17\delta \) time to achieve consensus. In this work, we introduce a new Paxos based algorithm that lowers the upper bound to \(11\delta \). We also show that if all process failures are initial, the upper bound can be further reduced to \(5\delta \). Our algorithm selectively enables processes executing lower rounds to decide irrespective of the presence of higher rounds in the system, minimizing the effect of recovered processes starting a higher round.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized Paxos Made Byzantine (and Less Complex)

Asynchronous Consensus in Synchronous Systems Using send_to_all Primitive

Article 12 October 2023

Reaching Consensus in the Presence of Contention-Related Crash Failures

Notes

The R in R-good run and R-nice run stands for recoverable: these runs allow failed processes to recover and participate in consensus.
Messages broadcast from failed process introduce additional overhead while solving consensus because when another process receives their message, the latter would not know the former has failed and thus does not discard the message while processing the message and changing its state accordingly.
An Accept message contains a round number and an estimate.
The maximum value of t is f.
Including recovered processes after \(T_S\)—since they wait for \(\delta \) time before starting a round after recovery, they receive the dedicated message containing hr from other processes.

References

Ailijiang, A., Charapko, A., Demirbas, M.: Consensus in the cloud: Paxos systems demystified. In: IEEE 25th International Conference on Computer Communication and Networks (ICCCN), pp. 1–10 (2016)
Alagappan, R., Ganesan, A., Lee, E., Albarghouthi, A., Chidambaram, V., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Protocol-aware recovery for consensus-based storage. In: 16th \(\{\text{USENIX}\}\) Conference on File and Storage Technologies (\(\{\text{ FAST }\}\) 18), \(\{USENIX\} Association\), pp. 15–32 (2018)
Alistarh, D., Gilbert, S., Guerraoui, R., Travers, C.: How to Solve Consensus in the Smallest Window of Synchrony, pp. 32–46. Springer, Berlin. https://doi.org/10.1007/978-3-540-87779-0_3 (2008)
Batra, R.: Implementation and evaluation of paxos and raft distributed consensus protocols. PhD thesis (2017)
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM (JACM) 43(4), 685–722 (1996)
Article MathSciNet MATH Google Scholar
Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing, pp. 398–407. ACM (2007)
Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. J. ACM (JACM) 34(1), 77–97 (1987)
Article MathSciNet MATH Google Scholar
Dutta, P., Guerraoui, R., Lamport, L.: How fast can eventual synchrony lead to consensus? In: International Conference on Dependable Systems and Networks (DSN’05), pp. 22–27. https://doi.org/10.1109/DSN.2005.54 (2005)
Dutta, P., Guerraoui, R., Keidar, I.: The overhead of consensus failure recovery. Distrib. Comput. 19(5–6), 373–386 (2007)
Article MATH Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988). https://doi.org/10.1145/42282.42283
Article MathSciNet Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM (JACM) 32(2), 374–382 (1985)
Article MathSciNet MATH Google Scholar
Keidar, I., Rajsbaum, S.: On the cost of fault-tolerant consensus when there are no faults: preliminary version. ACM SIGACT News 32(2), 45–63 (2001)
Article Google Scholar
Lamport, L., et al.: Paxos made simple. ACM Sigact News 32(4), 18–25 (2001)
Google Scholar
Lorch, J.R., Adya, A., Bolosky, W.J., Chaiken, R., Douceur, J.R., Howell, J.: The smart way to migrate replicated stateful services. ACM SIGOPS Oper. Syst. Rev. ACM 40, 103–115 (2006)
Lynch, N.A.: Distributed algorithms. Elsevier, Amsterdam (1996)
MATH Google Scholar
Moraru, I., Andersen, D.G., Kaminsky, M.: There is more consensus in egalitarian parliaments. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles, pp. 358–372. ACM (2013)
Ongaro, D., Ousterhout, J.K.: In: search of an understandable consensus algorithm. In: USENIX Annual Technical Conference, pp. 305–319 (2014)
Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM (JACM) 27(2), 228–234 (1980)
Article MathSciNet MATH Google Scholar
Sutra, P., Shapiro, M.: Fast genuine generalized consensus. In: 30th IEEE Symposium on Reliable Distributed Systems (SRDS), pp. 255–264. IEEE (2011)
Van Renesse, R., Altinbuken, D.: Paxos made moderately complex. ACM Comput. Surv. (CSUR) 47(3), 42 (2015)
Google Scholar
Van Renesse, R., Schiper, N., Schneider, F.B.: Vive la différence: Paxos vs. viewstamped replication vs. zab. IEEE Trans. Depend. Secur. Comput. 12(4), 472–484 (2015)
Article Google Scholar
Yanhua, M.F., Junqueria, P., Marzullo, K.: Mencius: building efficient replicated state machines for WANs. In: Proceedings of the Symposium on Operating System Design and Implementation, pp. 369–384 (2008)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana, India
Sathyanarayanan Srinivasan & Ramesh Kandukoori

Authors

Sathyanarayanan Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh Kandukoori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sathyanarayanan Srinivasan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasan, S., Kandukoori, R. A Paxos based algorithm to minimize the overhead of process recovery in consensus. Acta Informatica 56, 433–446 (2019). https://doi.org/10.1007/s00236-019-00334-w

Download citation

Received: 19 June 2018
Accepted: 13 April 2019
Published: 27 April 2019
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s00236-019-00334-w

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Paxos based algorithm to minimize the overhead of process recovery in consensus

Abstract

Access this article

Similar content being viewed by others

Generalized Paxos Made Byzantine (and Less Complex)

Asynchronous Consensus in Synchronous Systems Using send_to_all Primitive

Reaching Consensus in the Presence of Contention-Related Crash Failures

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

A Paxos based algorithm to minimize the overhead of process recovery in consensus

Abstract

Access this article

Similar content being viewed by others

Generalized Paxos Made Byzantine (and Less Complex)

Asynchronous Consensus in Synchronous Systems Using send_to_all Primitive

Reaching Consensus in the Presence of Contention-Related Crash Failures

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation