Preserving stabilization while practically bounding state space using incorruptible partially synchronized clocks

Tekken Valapil, Vidhya; Kulkarni, Sandeep S.

doi:10.1007/s00446-019-00365-z

Preserving stabilization while practically bounding state space using incorruptible partially synchronized clocks

Published: 16 November 2019

Volume 33, pages 423–443, (2020)
Cite this article

Distributed Computing Aims and scope Submit manuscript

154 Accesses
Explore all metrics

Abstract

Stabilization is a key dependability property for dealing with unanticipated transient faults, as it guarantees that even in the presence of such faults, the system will recover to states where it satisfies its specification. One of the desirable attributes of stabilization is the use of bounded space for each variable. In this paper, we present an algorithm that transforms a stabilizing program that uses variables with unbounded domain into a stabilizing program that uses bounded variables by using partially synchronized physical time. Specifically, our algorithm relies on bounded clock drift $\epsilon $ among processes and message delivery that either delivers the message in time $\delta $ or loses it. If we let $\epsilon $ to be as much as 100 s and $\delta $ to be as much as 1 h, this property is satisfied by any practical system. While non-stabilizing programs (that do not handle transient faults) can deal with unbounded variables by assigning large enough but bounded space, stabilizing programs—that need to deal with arbitrary transient faults—cannot do the same since a transient fault may corrupt the variable to its maximum value. We show that our transformation algorithm is applicable to several problems including logical clocks, vector clocks, mutual exclusion, diffusing computations, and so on. Moreover, our approach can also be used to bound counters used in an earlier work by Katz and Perry for adding stabilization to a non-stabilizing program. By combining our algorithm with that work by Katz and Perry and by assuming incorruptible partially synchronized clocks, it would be possible to provide stabilization for a rich class of problems, by assigning large enough but bounded space for variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-Stabilizing Byzantine Clock Synchronization with Optimal Precision

Article Open access 20 January 2018

Pankaj Khanchandani & Christoph Lenzen

A Nearly Optimal Upper Bound for the Self-Stabilization Time in Herman’s Algorithm

A nearly optimal upper bound for the self-stabilization time in Herman’s algorithm

Article 15 March 2015

Yuan Feng & Lijun Zhang

Notes

The results can also be extended to cases where physical clocks are eventually within $\epsilon $ of each other. However, in this case, the total time for convergence would also include the time required to restore the clocks to be within $\epsilon $ of each other. For the sake of simplicity, this issue is considered to be beyond the scope of the paper.
The variable channel contains messages, and each message m in it is associated with a timestamp cl.m. There can be other details associated with a message, like id of the sender process, id of the receiver process, etc., but since timestamp is the only information relevant to our algorithm, we refer to channel as a variable that contains message timestamps.
For origins of 3 and 11, we refer the reader to the text at the beginning of Sect. 5.5.

References

Alon, N., Attiya, H., Dolev, S., Dubois, S., Potop-Butucaru, M., Tixeuil, S.: Practically stabilizing SWMR atomic memory in message-passing systems. J. Comput. Syst. Sci. 81(4), 692–701 (2015). https://doi.org/10.1016/j.jcss.2014.11.014
Article MathSciNet MATH Google Scholar
Arora, A., Gouda, M.G.: Distributed reset. IEEE Trans. Comput. 43(9), 1026–1038 (1994)
Article Google Scholar
Arora, A., Kulkarni, S., Demirbas, M.: Resettable vector clocks. In: Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 269–278. ACM, NY (2000). https://doi.org/10.1145/343477.343628
Arora, A., Kulkarni, S.S.: Designing masking fault-tolerance via nonmasking fault-tolerance. IEEE Trans. Softw. Eng. 24(6), 435–450 (1998). https://doi.org/10.1109/32.689401
Article Google Scholar
Awerbuch, B., Patt-Shamir, B., Varghese, G.: Bounding the unbounded. In: Proceedings IEEE INFOCOM ’94, The Conference on Computer Communications, Thirteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Networking for Global Communications, Toronto, Ontario, Canada, June 12–16, 1994, pp. 776–783 (1994). https://doi.org/10.1109/INFCOM.1994.337661
Blanchard, P., Dolev, S., Beauquier, J., Delaët, S.: Practically Self-stabilizing Paxos Replicated State-Machine, pp. 99–121. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09581-3_8
Book Google Scholar
Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63–75 (1985). https://doi.org/10.1145/214451.214456
Article Google Scholar
Chandy, K.M., Misra, J.: Parallel Program Design—A Foundation. Addison-Wesley, Reading (1989)
MATH Google Scholar
Dasgupta, A., Ghosh, S., Xiao, X.: Probabilistic fault-containment. In: Masuzawa, T., Tixeuil, S. (eds.) Stabilization, Safety, and Security of Distributed Systems, 9th International Symposium, 2007, Paris, France, November 14–16, 2007, Proceedings, Lecture Notes in Computer Science, vol. 4838, pp. 189–203. Springer, New York (2007). https://doi.org/10.1007/978-3-540-76627-8_16
Chapter Google Scholar
Dijkstra, E.W.: Self-stabilizing systems in spite of distributed control. Commun. ACM 17(11), 643–644 (1974)
Article Google Scholar
Dijkstra, E.W., Scholten, C.S.: Termination detection for diffusing computations. Inf. Process. Lett. 11(1), 1–4 (1980)
Article MathSciNet Google Scholar
Dolev, S., Georgiou, C., Marcoullis, I., Schiller, E.M.: Self-stabilizing virtual synchrony. In: Proceedings of the Stabilization, Safety, and Security of Distributed Systems—17th International Symposium, SSS 2015, Edmonton, AB, Canada, August 18–21, 2015, pp. 248–264 (2015). https://doi.org/10.1007/978-3-319-21741-3_17
Fidge, C.J.: Timestamps in message-passing systems that preserve the partial ordering. In: Proceedings of the 11th Australian Computer Science Conference, vol. 10, no. 1, pp. 56–66 (1988)
Fischer, M.J., Lynch, N.A., Paterson, M.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985). https://doi.org/10.1145/3149.214121
Article MathSciNet MATH Google Scholar
Garcia-Luna-Aceves, J.J.: Loop-free routing using diffusing computations. IEEE/ACM Trans. Netw. 1(1), 130–141 (1993). https://doi.org/10.1109/90.222913
Article Google Scholar
Ghosh, S.: Distributed Systems: An Algorithmic Approach. Chapman & Hall, London (2014)
Book Google Scholar
Ghosh, S., Gupta, A., Herman, T., Pemmaraju, S.V.: Fault-containing self-stabilizing distributed protocols. Distrib. Comput. 20(1), 53–73 (2007). https://doi.org/10.1007/s00446-007-0032-2
Article MATH Google Scholar
Katz, S., Perry, K.J.: Self-stabilizing extensions for meassage-passing systems. Distrib. Comput. 7(1), 17–26 (1993). https://doi.org/10.1007/BF02278852
Article MATH Google Scholar
Kulkarni, S.S., Arora, A.: Multitolerance in distributed reset. Chicago J. Theor. Comput. Sci. (1998). http://cjtcs.cs.uchicago.edu/articles/1998/4/contents.html
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978). https://doi.org/10.1145/359545.359563
Article MATH Google Scholar
Lamport, L., Lynch, N.A.: Distributed Computing: Models and Methods. MIT Press, Cambridge (1990)
MATH Google Scholar
Lee, S., Muhammad, R.M., Kim, C.: A Leader Election Algorithm Within Candidates on Ad Hoc Mobile Networks, pp. 728–738. Springer, Berlin (2007)
Google Scholar
Mattern, F.: Virtual time and global states of distributed systems. In: Cosnard M (ed.) Parallel and Distributed Algorithms, pp. 215–226. North-Holland (1989)
Valapil, V.T., Kulkarni, S.S.: Preserving stabilization while practically bounding state space. In: 13th European Dependable Computing Conference, EDCC 2017, Geneva, Switzerland, September 4–8, 2017, pp. 26–33 (2017). https://doi.org/10.1109/EDCC.2017.13
Vasudevan, S., Kurose, J.F., Towsley, D.F.: Design and analysis of a leader election algorithm for mobile ad hoc networks. In: 12th IEEE International Conference on Network Protocols, Berlin, Germany, pp. 350–360. IEEE Computer Society (2004). https://doi.org/10.1109/ICNP.2004.1348124
Yingchareonthawornchai, S., Kulkarni, S.S., Demirbas, M.: Analysis of bounds on hybrid vector clocks. In: OPODIS 2015, December 14–17, 2015, Rennes, France, pp. 34:1–34:17 (2015). https://doi.org/10.4230/LIPIcs.OPODIS.2015.34

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
Vidhya Tekken Valapil & Sandeep S. Kulkarni

Authors

Vidhya Tekken Valapil
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep S. Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vidhya Tekken Valapil.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by NSF CNS 1329807, NSF CNS 1318678, and XPS 1533802. This is an extension of the previous work that appeared in the 13th European Dependable Computing Conference (EDCC), 2017.

Appendix

In this section, we present the detailed analysis of the effect on bound of counters derived by our algorithm when clocks (of processes and global clock) differ from each other by more that one region. This section also includes a summarized table of notations used in this paper.

1.1 Proof of our claim on the effect of clocks differing by multiple regions

Our transformation algorithm was based on $nReg=1$ in Definition 15. For a distributed system of n processes where the physical clocks of the processes are guaranteed to be synchronized within 0.1 s of each other and w.r.t the global clock, to achieve $nReg=1$ the designer could choose $\mathcal {RS} =0.1 s$.

In this section, we analyze the effect of varying nReg on the range or size of the counters. In other words, we would like to answer the question “What is the effect on the range of the counters (i.e.,bound on counters determined by the transformation algorithm) if the clocks of processes are more than one region apart (w.r.t each other and w.r.t the global clock) i.e. $nReg > 1$ ?” Allowing $nReg > 1$ could help the designer to choose a smaller $\mathcal {RS} $. For instance, in the example discussed above, say if the regions identified by the physical clocks of the processes (w.r.t each other and w.r.t global clock) are allowed to differ by at most 100 regions, i.e. $nReg=100$ then the designer could choose $\mathcal {RS} =1 ms$.

Observe that when nReg changes variables $max_{inc} $ and $max_r$ vary accordingly. For example though the system described above with $nReg = 1$ and $\mathcal {RS} = 0.1 s$ can be equivalently modeled with $nReg = 100$ and $\mathcal {RS} = 1 ms$, the values of $max_r$ and $max_{inc} $ differ in the two settings. If the growth in the counters is distributed uniformly, and if $max_{inc} =100$ in the first setting, then $max_{inc} $ would be 1 in the system with the second setting. Similarly, if $max_r=1$ in the the first setting, then $max_r$ could be 100 in the second setting i.e. with $nReg=100$ and $\mathcal {RS} = 1 ms$.

1.1.1 Bound for multiple regions

If nReg = 1, i.e. clocks of any two processes differ from each other by at most one region and clock of any process differs from the global clock by at most one region, we try to ensure that any free counter is in the range:$[3 (r) max_{inc} ..3(r+1)max_{inc} + 2max_{inc}-1 ]$ and (derived from the range of the free counters) any dependent counter to be in the range: $[3 ((r-2-max_r)) max_{inc} ..3(r+1)max_{inc} + 2max_{inc}-1 ]$. The size of this range of dependent counters is: $max_{inc} (11 + 3max_r)$. Based on this size, each unbounded counter in the program is maintained in modulo B arithmetic. In other words, values of unbounded counters of the original program are bounded by B in the transformed program, where

$$\begin{aligned} B=3[max_{inc} (11 + 3max_r)] \end{aligned}$$

(1)

If nReg = 2, i.e. clocks of any two processes are allowed to differ from each other by at most 2 regions and clock of any process is allowed to differ from the global clock by at most 2 regions, then we will try to ensure that any free counter is in the range: $[4*r*max_{inc} '$ .. $4*(r+1)*max_{inc} '+3*max_{inc} '-1]$ and (derived from the range of the free counters) any dependent counter to be in the range: $[4*(r-2-max_r')*max_{inc} '$ .. $4*(r+1)*max_{inc} '+3*max_{inc} '-1]$.

Recall that when nReg changes variables $max_{inc} $ and $max_r$ vary accordingly, this is denoted by variables $max_r'$ and $max_{inc} '$ in the above equations. Now by generalizing formulas presented above, i.e. clocks of any two processes are allowed to differ from each other by at most nReg regions and clock of any process is allowed to differ from the global clock by at most nReg regions, we try to ensure that any free counter is in the range: $[(nReg+2)(r)max_{inc} '..(nReg+2)(r+1)max_{inc} ' + (nReg+1)max_{inc} ' -1 ]$ and any dependent counter to be in the range: $[(nReg+2)((r-(nReg+1)-max_r))max_{inc} ' ..(nReg+2)(r+1)max_{inc} ' + (nReg+1)max_{inc} ' -1 ]$. The size of this range of dependent counters is: $max_{inc} '(nReg^2 + 5(nReg+1)+max_r'(nReg+2))$. Based on this size, each unbounded counter in the program would be maintained in modulo B arithmetic i.e. values of the unbounded counters of the original program will be bounded by B in the transformed program, where

$$\begin{aligned} B= 3[max_{inc} '(nReg^2 + 5(nReg+1)+max_r'(nReg+2))] \end{aligned}$$

(2)

1.1.2 Analyzing bounds for counters when processes differ by multiple regions versus a single region

Here we analyze if $nReg > 1$ is beneficial over $nReg = 1$, i.e. if the bound (on the counters) identified when $nReg>1$ is smaller than the bound identified when $nReg=1$. Formally, if the bound identified with $nReg > 1$ is denoted as new [Eq. (2) in A.1.1] and if the bound identified with $nReg=1$ is denoted as old [Eq. (1) in A.1.1], then we would like to check if the following is true,

$$\begin{aligned}&old - new \ge 0\nonumber \\&\quad (3[max_{inc} (11 + 3max_r)]) - \nonumber \\&\quad (3[max_{inc} '(nReg^2 + 5(nReg+1)+max_r'(nReg+2))]) \ge 0, \nonumber \\&\quad t\quad (\textit{where }nReg > 1) \nonumber \\&\quad (max_{inc} (11 + 3max_r))- \nonumber \\&\quad (max_{inc} '(nReg^2 + 5(nReg+1)+max_r'(nReg+2))) \ge 0 \end{aligned}$$

(3)

If the growth in the counters is distributed uniformly across time, then when the region size becomes smaller (or larger) the bound on the growth in counters within a region becomes smaller (and larger respectively). In other words as nReg becomes larger, RS (region-size) becomes smaller, and $max_{inc} $ (maximum growth in counters within a region) becomes smaller, and if nReg becomes smaller, RS becomes larger and $max_{inc} $ becomes larger. We apply this notion to the second half of the above equation, i.e.,as nReg increases $max_{inc} $ becomes smaller. So the above equation can be rewritten as,

$$\begin{aligned}&(max_{inc} (11 + 3max_r)) -\nonumber \\&\quad (\dfrac{max_{inc}}{nReg} *(nReg^2 + 5(nReg+1)\nonumber \\&\quad +max'_{inc}(nReg+2)) \ge 0 \end{aligned}$$

(4)

Also, from Sect. 5.3 recall that $max_r$ stands for $max(r_b +r_f)$. So as nReg increases, $max_r$ also increases. So the above equation becomes,

$$\begin{aligned}&(max_{inc} (11 + 3max_r)) -\nonumber \\&\quad (\dfrac{max_{inc}}{nReg} *(nReg^2 + 5(nReg+1)\nonumber \\&\quad +(nReg*max_r)(nReg+2))\nonumber \\&\quad \ge 0 \end{aligned}$$

(5)

Solving the above equation results in the below equation:

$$\begin{aligned} (6-nReg-\dfrac{5}{nReg}+ max_r*(1-nReg)) \ge 0 \end{aligned}$$

(6)

Here $max_r>0$, and we will analyze the following two cases (i) $max_r=1$, (ii) $max_r>1$.

Substituting (i) in (6), we obtain $nReg\le 1$ or $nReg \le 2.5$. So nReg has to be 2 for equation (3) to be true. In other words, the bound on the counters when $nReg>1$ is smaller than the bound obtained with $nReg=1$ only for the case where $nReg=2$. Observe that this is true only if the growth in the counters is distributed uniformly over time and if $max_r=1$.

Substituting (ii) in (6) we obtain that $nReg \le 1$ or $nReg \le 1.67$. So we observe that if $max_r>1$ then the only beneficial choice is $nReg=1$, i.e. the notion of modeling region-size such that clocks of any two processes ( or the physical clock of any process and global clock) differ by at most 1 region.

1.2 Summary of notations

Generic variables
p	Program
$V_p$	Set of variables of program p
$SV_p$	Dynamic-sized equivalent of $V_p$, i.e., a dynamically changing collection of only simple variables, obtained by unraveling complex variables of $V_p$ into the simple variables contained in them
$A_p$	Set of actions of program p
s	State of program p
$s_l$	lth state in a computation of program p
guard	Condition involving variables in $V_p$
statement	Task involving update of a subset of variables in $V_p$
$\rho $, $\rho '$	Computation prefixes
x	Variable in $V_p$
x(s)	Value of variable x in state s
fc	Free counter
$fc(s_l)$	Value of free counter fc in state $s_l$
w, a, d	Positive integers unless specified otherwise
dc	Dependent counter
S	Set of states
RS	Region size
$t_g$	Abstract global time
$t_j$	Physical time at process j
$\lfloor \frac{t_g}{\mathcal {RS}} \rfloor $	Abstract global region
$\lfloor \frac{t_j}{\mathcal {RS}} \rfloor $	Region of process j
r	Region
$r_b,r_f$	Used to characterize life of a dependent counter in terms of regions
$max_r$	Maximum of $(r_b+r_f)$ of any dependent counter
$max_{inc}$	Maximum increase in any free counter within a global region
$p'$	Program obtained by applying our transformation algorithm to program p
B	3[$max_{inc} (11 + 3max_r)$] or 3 times the range of any dependent counter

Variables in Katz and Perry example
x, y	Round numbers
nr	Next round
cr	Current round
lr	Round number when the last real reset was performed
b	Boolean variable that identifies if the reset was real or fake

Variables in Lamport’s logical clocks example
j, k	Processes
cl.j	Logical clock value of process j
m	Message
cl.m	Message timestamp or logical clock value associated with message m
$channel_{j,k}$	Complex variable that contains timestamps of messages in transit between process j and process k
v	Number of regions within which a message is guaranteed to be delivered at the receiver process

Variables in vector clocks example
vc.j	Vector clock maintained at process j
vc.j.k	Highest clock or counter value of process k that process j is aware of

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tekken Valapil, V., Kulkarni, S.S. Preserving stabilization while practically bounding state space using incorruptible partially synchronized clocks. Distrib. Comput. 33, 423–443 (2020). https://doi.org/10.1007/s00446-019-00365-z

Download citation

Received: 01 November 2018
Accepted: 21 October 2019
Published: 16 November 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00446-019-00365-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preserving stabilization while practically bounding state space using incorruptible partially synchronized clocks

Abstract

Access this article

Similar content being viewed by others

Self-Stabilizing Byzantine Clock Synchronization with Optimal Precision

A Nearly Optimal Upper Bound for the Self-Stabilization Time in Herman’s Algorithm

A nearly optimal upper bound for the self-stabilization time in Herman’s algorithm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of our claim on the effect of clocks differing by multiple regions

1.1.1 Bound for multiple regions

1.1.2 Analyzing bounds for counters when processes differ by multiple regions versus a single region

1.2 Summary of notations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preserving stabilization while practically bounding state space using incorruptible partially synchronized clocks

Abstract

Access this article

Similar content being viewed by others

Self-Stabilizing Byzantine Clock Synchronization with Optimal Precision

A Nearly Optimal Upper Bound for the Self-Stabilization Time in Herman’s Algorithm

A nearly optimal upper bound for the self-stabilization time in Herman’s algorithm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of our claim on the effect of clocks differing by multiple regions

1.1.1 Bound for multiple regions

1.1.2 Analyzing bounds for counters when processes differ by multiple regions versus a single region

1.2 Summary of notations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation