Elsevier

Information Sciences

Volume 574, October 2021, Pages 162-175
Information Sciences

Synchronizing billion-scale automata

https://doi.org/10.1016/j.ins.2021.05.072Get rights and content

Highlights

  • Existing synchronization heuristics do not scale due to quadratic space complexity.

  • We propose a simple approach to avoid memory usage thanks to massive parallelism.

  • We use different parallelization approaches on CPUs and GPUs, in a hybrid way.

  • A different treatment of parallelism is useful at different phases of the algorithm.

  • Our algorithms can synchronize a billion-state automaton in around 4 mins.

Abstract

Synchronizing sequences for large-scale automata have gained popularity recently due to their practical use cases especially to have a faster and better testing process. In many applications, shorter sequences imply less overhead and faster processing time but the problem of finding the shortest synchronizing sequence is NP-hard and requires heuristic approaches to be solved. State-of-the-art heuristics manage to obtain desirable, short sequences with relatively small execution times. However, all these heuristics suffer their quadratic memory complexity and fail to scale when the input automaton gets larger. In this paper, we propose an approach exploiting GPUs and hybrid parallelism which can generate synchronizing sequences even for billion-scale automata, in a short amount of time. Overall, the algorithm can generate a synchronizing sequence for a random automaton with n=108 states in 12.1 s, n=5×108 states in 69.1 s, and billion states in 148.2 s.

Introduction

Designing and developing a large-scale, correct, and complex system is not an easy task. Several validation techniques have been proposed to build some confidence in the developed systems, but testing stands out as one of the most practical one [23]. To automate the testing process, there has been much interest in testing with Finite State Machines (FSMs), e.g., see [8], [13], [33], [14], [25], [29]. To employ FSMs for testing, one needs to bring the system under test (SUT) to a particular state. It is quite easy to do that when a trusted reset input exists in the SUT. However, such a reset input is not always available.

A synchronizing sequence (also known as a reset sequence or a reset word) for an FSM is a sequence of inputs such that when applied to the FSM, the machine ends up in a particular state no matter at which state it initially is. Therefore a synchronizing sequence is a compound reset input for a machine. The shorter the synchronizing sequence is, the quicker is the synchronization process. Hence, shorter reset sequences are desirable in terms of synchronization time and energy spent. However, the problem of finding a shortest synchronizing sequence is NP-hard [11]. It is conjectured that for a synchronizing automaton with n states, the length of the shortest synchronizing sequence is at most (n-1)2, which is known as the Černý Conjecture in the literature [6], [7]. Posed half a century ago, the conjecture is still open but recently verified for all binary automata with at most 12 states, and all ternary automata with at most 8 states by using high-performance computing [21]. Furthermore, it has been shown that the probability that the conjecture does not hold for a random synchronizing binary automaton is exponentially small in terms of the number of states [5].

The motivation to study synchronizing sequences comes not only from the testing domain but also from different fields including automata theory, robotics, bio-computing, set theory, propositional calculus, model-based testing, and many more e.g., [17], [32], [2], [4], [27], [26]. For a survey of applications of synchronizing sequences, we refer the reader to [34] in which applications of synchronizing words together with a survey of theoretical results related to synchronizing automata are presented. In this work, we focus on large scale automata and FSMs since typically these automata/FSM models are not manually designed in practice. Instead, a high level formalism, such as SDL [16], StateCharts [12], UML [24], SystemVerilog [15], etc., is used for the design task. Analysis tools extract the underlying automata or FSMs by flattening the hierarchy, concurrency, data, and the binary encoding (e.g. in the case of hardware description languages). This flattening generally results in an enormous size for the underlying automaton/FSM. The well–known state space explosion problem in model-checking [9] is just one famous example of the scalability problems faced due to such flattening. Recently, researchers focus on finding synchronizing sequences for large-scale automata such as partial automata by using high-performance computing hardware such as GPUs [31].

Due to the hardness of finding a shortest sequence, there exist heuristics in the literature, known as synchronizing heuristics, to compute short synchronizing words. Among such heuristics are Greedy by [11], Cycle by [30], SynchroP by [28], SynchroPL by [28], and FastSynchro by [22]. In terms of complexity, these heuristics are ordered as follows: Greedy/Cycle with time complexity O(n3+pn2), FastSynchro with time complexity O(pn4), and finally SynchroP/SynchroPL with time complexity O(n5+pn2), where n is the number of states and p is the size of the alphabet.

The fastest synchronizing heuristics, Greedy and Cycle, are also the earliest heuristics that appeared in the literature. Therefore Greedy and Cycle are usually considered as a baseline to evaluate the quality and the performance of novel heuristics. Newer heuristics do generate shorter synchronizing words, but by performing a more complex analysis, which implies a substantial runtime increase. The speed of Greedy and Cycle are unmatched to date. Yet, it has been recently shown that they can be implemented in a much faster way via various optimizations [20]. More optimizations have been proposed for the slower, but better heuristics [1] and their parallelization have also been studied in the literature [18].

All the aforementioned heuristics work in two phases; in the first phase, an auxiliary data structure is generated to summarize the shortest sequences that merge state pairs. In the second phase, by using this data structure, the sequence is constructed by concatenating some of these pairwise shortest sequences. Hence, the memory consumption of the heuristics is at least quadratic in terms of n. This complexity makes all the heuristics impractical even for automata with hundreds of thousands of states.

In this work, we focus on synchronizing large automata with Greedy. We modified the two-phase structure by removing the first phase and burdening the extra overhead via high-performance, parallel algorithms designed to utilize the power of multi-core CPUs or Graphics Processing Units (GPUs). For an effective and efficient parallelization, we observed the changes in the synchronization behavior and tried to utilize the CPU/GPU cores to their full potential. In our experiments, we obtained around 6.9× and 11.2× speedups with 16 CPU threads, on automata having n=108 and n=5×108 states, respectively. By utilizing a GPU after the pairwise synchronizing paths get longer, the speedups, compared to the sequential execution, are increased to 12.1× and 20.0×. Overall, via a hybrid solution using both CPU and GPU, we could synchronize a random automaton with n=108 states in 12.1 s, n=5×108 states in 69.1 s, and billion states in 148.2 s where a single core execution of the same algorithm takes 147.1 and 1348.7 seconds for n=108 and n=5×108 states, respectively.

The rest of the manuscript is organized as follows: Section 2 presents the background and notation. The proposed parallel Greedy heuristic is presented in Section 3 and further optimizations are described in Section 4. Section 5 presents the experimental results and Section 6 concludes the paper.

Section snippets

Background and notation

A (complete and deterministic) automaton is defined by a triple A=(S,Σ,δ) where S={s1,s2,,sn} is a finite set of n states, Σ is a finite alphabet consisting of p input letters (or simply letters). δ:S×ΣS is a transition function.

An element of the set Σ is called a word. For a word wΣ, we use |w| to denote the length of w, and ε is the empty word of length 0. We extend the transition function δ to a set of states and to a word in the usual way. We have δ(s,ε)=s,sS, and for a word wΣ and

Parallel Greedy for a billion-scale automaton

With its quadratic memory complexity, it is impossible to execute Greedy, and also other heuristics, on automata having more than 105 states; such an execution requires at least 40 GB memory. We re-structure Greedy to generate synchronization sequences for large-scale automata. As Greedy, the proposed approach always aims to find a shortest merging word to reduce the number of active states at each iteration. That is done in a brute-force manner, i.e., by trying all the words until one that

Further optimizations for large-scale automata synchronization

The aforementioned strategies improve the performance by leveraging the parallelism offered by multi-core and many-core architectures. Further improvement is possible with various extra optimizations. Without a careful implementation, the algorithms may suffer from false sharing, bad cache utilization, redundant computation, etc. Here we describe how a performance improvement can be obtained by applying intelligent optimization techniques.

Experimental results

We used two different architectures for the experiments. The preliminary CPU experiments to visualize the impact of sorting and memoization are performed on a machine running on 64 bit CentOS 6.5 equipped with 64 GB RAM and a dual-socket Intel Xeon E7-4870 v2 clocked at 2.30 GHz where each socket has 15 cores (30 in total). The main experiments are performed on a machine running on 64 bit Ubuntu 16.04 equipped with 1 TB RAM and a dual-socket Intel Xeon 6152 clocked at 2.10 GHz where each socket

Conclusion and future work

Finding synchronizing sequences for large-scale automata is important due to their applications in practice, especially in software testing. Since the shortest synchronizing sequence problem is NP-hard, various heuristics have been proposed with a quadratic memory complexity which fail to scale to large-scale automata. We propose an iterative algorithm which mimics the Greedy heuristic in the literature, i.e., finds a shortest sequence that decreases the set cardinality, and eventually finds

CRediT authorship contribution statement

Mustafa Kemal Taş: Software, Investigation, Writing - original draft, Visualization. Kamer Kaya: Software, Writing - original draft, Writing - review & editing, Supervision. Hüsnü Yenigün: Conceptualization, Writing - original draft, Writing - review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by TÜBİTAK grant #114E569. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research.

References (35)

  • Y. Benenson et al.

    Programmable and autonomous computing machine made of biomolecules

    Nature

    (2001)
  • J. Černý

    Poznámka k homogénnym experimentom s konečnými automatmi

    Matematicko-fyzikálny časopis

    (1964)
  • J. Černý et al.

    On directable automata

    Kybernetika

    (1971)
  • T.S. Chow

    Testing software design modeled by finite-state machines

    IEEE Trans. Software Eng.

    (1978)
  • E.M. Clarke, W. Klieber, M. Nováček, P. Zuliani, Model Checking and the State Explosion Problem, Springer Berlin...
  • H. Don et al.

    Slowly synchronizing automata with fixed alphabet size

    Inf. Comput.

    (2020)
  • D. Eppstein

    Reset sequences for monotonic automata

    SIAM J. Comput.

    (1990)
  • Cited by (6)

    View full text