Synchronizing billion-scale automata

doi:10.1016/j.ins.2021.05.072

Information Sciences

Volume 574, October 2021, Pages 162-175

https://doi.org/10.1016/j.ins.2021.05.072 Get rights and content

Highlights

•
Existing synchronization heuristics do not scale due to quadratic space complexity.
•
We propose a simple approach to avoid memory usage thanks to massive parallelism.
•
We use different parallelization approaches on CPUs and GPUs, in a hybrid way.
•
A different treatment of parallelism is useful at different phases of the algorithm.
•
Our algorithms can synchronize a billion-state automaton in around 4 mins.

Abstract

Synchronizing sequences for large-scale automata have gained popularity recently due to their practical use cases especially to have a faster and better testing process. In many applications, shorter sequences imply less overhead and faster processing time but the problem of finding the shortest synchronizing sequence is NP-hard and requires heuristic approaches to be solved. State-of-the-art heuristics manage to obtain desirable, short sequences with relatively small execution times. However, all these heuristics suffer their quadratic memory complexity and fail to scale when the input automaton gets larger. In this paper, we propose an approach exploiting GPUs and hybrid parallelism which can generate synchronizing sequences even for billion-scale automata, in a short amount of time. Overall, the algorithm can generate a synchronizing sequence for a random automaton with $n = 10^{8}$ states in 12.1 s, $n = 5 \times 10^{8}$ states in 69.1 s, and billion states in 148.2 s.

Introduction

Designing and developing a large-scale, correct, and complex system is not an easy task. Several validation techniques have been proposed to build some confidence in the developed systems, but testing stands out as one of the most practical one [23]. To automate the testing process, there has been much interest in testing with Finite State Machines (FSMs), e.g., see [8], [13], [33], [14], [25], [29]. To employ FSMs for testing, one needs to bring the system under test (SUT) to a particular state. It is quite easy to do that when a trusted reset input exists in the SUT. However, such a reset input is not always available.

A synchronizing sequence (also known as a reset sequence or a reset word) for an FSM is a sequence of inputs such that when applied to the FSM, the machine ends up in a particular state no matter at which state it initially is. Therefore a synchronizing sequence is a compound reset input for a machine. The shorter the synchronizing sequence is, the quicker is the synchronization process. Hence, shorter reset sequences are desirable in terms of synchronization time and energy spent. However, the problem of finding a shortest synchronizing sequence is NP-hard [11]. It is conjectured that for a synchronizing automaton with n states, the length of the shortest synchronizing sequence is at most ${(n - 1)}^{2}$ , which is known as the Černý Conjecture in the literature [6], [7]. Posed half a century ago, the conjecture is still open but recently verified for all binary automata with at most 12 states, and all ternary automata with at most 8 states by using high-performance computing [21]. Furthermore, it has been shown that the probability that the conjecture does not hold for a random synchronizing binary automaton is exponentially small in terms of the number of states [5].

The motivation to study synchronizing sequences comes not only from the testing domain but also from different fields including automata theory, robotics, bio-computing, set theory, propositional calculus, model-based testing, and many more e.g., [17], [32], [2], [4], [27], [26]. For a survey of applications of synchronizing sequences, we refer the reader to [34] in which applications of synchronizing words together with a survey of theoretical results related to synchronizing automata are presented. In this work, we focus on large scale automata and FSMs since typically these automata/FSM models are not manually designed in practice. Instead, a high level formalism, such as SDL [16], StateCharts [12], UML [24], SystemVerilog [15], etc., is used for the design task. Analysis tools extract the underlying automata or FSMs by flattening the hierarchy, concurrency, data, and the binary encoding (e.g. in the case of hardware description languages). This flattening generally results in an enormous size for the underlying automaton/FSM. The well–known state space explosion problem in model-checking [9] is just one famous example of the scalability problems faced due to such flattening. Recently, researchers focus on finding synchronizing sequences for large-scale automata such as partial automata by using high-performance computing hardware such as GPUs [31].

Due to the hardness of finding a shortest sequence, there exist heuristics in the literature, known as synchronizing heuristics, to compute short synchronizing words. Among such heuristics are Greedy by [11], Cycle by [30], SynchroP by [28], SynchroPL by [28], and FastSynchro by [22]. In terms of complexity, these heuristics are ordered as follows: Greedy/Cycle with time complexity $O (n^{3} + {pn}^{2})$ , FastSynchro with time complexity $O ({pn}^{4})$ , and finally SynchroP/SynchroPL with time complexity $O (n^{5} + {pn}^{2})$ , where n is the number of states and p is the size of the alphabet.

The fastest synchronizing heuristics, Greedy and Cycle, are also the earliest heuristics that appeared in the literature. Therefore Greedy and Cycle are usually considered as a baseline to evaluate the quality and the performance of novel heuristics. Newer heuristics do generate shorter synchronizing words, but by performing a more complex analysis, which implies a substantial runtime increase. The speed of Greedy and Cycle are unmatched to date. Yet, it has been recently shown that they can be implemented in a much faster way via various optimizations [20]. More optimizations have been proposed for the slower, but better heuristics [1] and their parallelization have also been studied in the literature [18].

All the aforementioned heuristics work in two phases; in the first phase, an auxiliary data structure is generated to summarize the shortest sequences that merge state pairs. In the second phase, by using this data structure, the sequence is constructed by concatenating some of these pairwise shortest sequences. Hence, the memory consumption of the heuristics is at least quadratic in terms of n. This complexity makes all the heuristics impractical even for automata with hundreds of thousands of states.

In this work, we focus on synchronizing large automata with Greedy. We modified the two-phase structure by removing the first phase and burdening the extra overhead via high-performance, parallel algorithms designed to utilize the power of multi-core CPUs or Graphics Processing Units (GPUs). For an effective and efficient parallelization, we observed the changes in the synchronization behavior and tried to utilize the CPU/GPU cores to their full potential. In our experiments, we obtained around $6.9 \times$ and $11.2 \times$ speedups with 16 CPU threads, on automata having $n = 10^{8}$ and $n = 5 \times 10^{8}$ states, respectively. By utilizing a GPU after the pairwise synchronizing paths get longer, the speedups, compared to the sequential execution, are increased to $12.1 \times$ and $20.0 \times$ . Overall, via a hybrid solution using both CPU and GPU, we could synchronize a random automaton with $n = 10^{8}$ states in $12.1$ s, $n = 5 \times 10^{8}$ states in 69.1 s, and billion states in 148.2 s where a single core execution of the same algorithm takes $147.1$ and $1348.7$ seconds for $n = 10^{8}$ and $n = 5 \times 10^{8}$ states, respectively.

The rest of the manuscript is organized as follows: Section 2 presents the background and notation. The proposed parallel Greedy heuristic is presented in Section 3 and further optimizations are described in Section 4. Section 5 presents the experimental results and Section 6 concludes the paper.

Section snippets

Background and notation

A (complete and deterministic) automaton is defined by a triple $A = (S, Σ, δ)$ where $S = {s_{1}, s_{2}, \dots, s_{n}}$ is a finite set of n states, $Σ$ is a finite alphabet consisting of p input letters (or simply letters). $δ : S \times Σ \to S$ is a transition function.

An element of the set $Σ^{★}$ is called a word. For a word $w \in Σ^{★}$ , we use $| w |$ to denote the length of w, and $ε$ is the empty word of length 0. We extend the transition function $δ$ to a set of states and to a word in the usual way. We have $δ (s, ε) = s, \forall s \in S$ , and for a word $w \in Σ^{★}$ and

Parallel Greedy for a billion-scale automaton

With its quadratic memory complexity, it is impossible to execute Greedy, and also other heuristics, on automata having more than $10^{5}$ states; such an execution requires at least 40 GB memory. We re-structure Greedy to generate synchronization sequences for large-scale automata. As Greedy, the proposed approach always aims to find a shortest merging word to reduce the number of active states at each iteration. That is done in a brute-force manner, i.e., by trying all the words until one that

Further optimizations for large-scale automata synchronization

The aforementioned strategies improve the performance by leveraging the parallelism offered by multi-core and many-core architectures. Further improvement is possible with various extra optimizations. Without a careful implementation, the algorithms may suffer from false sharing, bad cache utilization, redundant computation, etc. Here we describe how a performance improvement can be obtained by applying intelligent optimization techniques.

Experimental results

We used two different architectures for the experiments. The preliminary CPU experiments to visualize the impact of sorting and memoization are performed on a machine running on 64 bit CentOS 6.5 equipped with 64 GB RAM and a dual-socket Intel Xeon E7-4870 v2 clocked at 2.30 GHz where each socket has 15 cores (30 in total). The main experiments are performed on a machine running on 64 bit Ubuntu 16.04 equipped with 1 TB RAM and a dual-socket Intel Xeon 6152 clocked at 2.10 GHz where each socket

Conclusion and future work

Finding synchronizing sequences for large-scale automata is important due to their applications in practice, especially in software testing. Since the shortest synchronizing sequence problem is NP-hard, various heuristics have been proposed with a quadratic memory complexity which fail to scale to large-scale automata. We propose an iterative algorithm which mimics the Greedy heuristic in the literature, i.e., finds a shortest sequence that decreases the set cardinality, and eventually finds

CRediT authorship contribution statement

Mustafa Kemal Taş: Software, Investigation, Writing - original draft, Visualization. Kamer Kaya: Software, Writing - original draft, Writing - review & editing, Supervision. Hüsnü Yenigün: Conceptualization, Writing - original draft, Writing - review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by TÜBİTAK grant #114E569. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research.

References (35)

D.S. Ananichev et al.
Synchronizing monotonic automata
Theor. Comput. Sci.
(2004)
M.V. Berlinkov et al.
Algebraic synchronization criterion and computing reset words
Inf. Sci.
(2016)
G. Jourdan et al.
Reduced checking sequences using unreliable reset
Inf. Process. Lett.
(2015)
S. Karahoda et al.
Multicore and manycore parallelization of cheap synchronizing sequence heuristics
J. Parallel Distrib. Comput.
(2020)
S. Karahoda et al.
Synchronizing heuristics: Speeding up the fastest
Expert Syst. Appl.
(2018)
R. Kudlacik et al.
Effective synchronizing algorithms
Expert Syst. Appl.
(2012)
A. Rezaki et al.
Construction of checking sequences based on characterization sets
Comput. Commun.
(1995)
A. Roman
Synchronizing finite automata with short reset words
Appl. Math. Comput.
(2009)
O.F. Altun, K. Atam, S. Karahoda, K. Kaya, Synchronizing heuristics: Speeding up the slowest, in: Testing Software and...
D.S. Ananichev et al.
Synchronizing automata with a letter of deficiency 2

Y. Benenson et al.

Programmable and autonomous computing machine made of biomolecules

Nature

(2001)

J. Černý

Poznámka k homogénnym experimentom s konečnými automatmi

Matematicko-fyzikálny časopis

(1964)

J. Černý et al.

On directable automata

Kybernetika

(1971)

T.S. Chow

Testing software design modeled by finite-state machines

IEEE Trans. Software Eng.

(1978)

E.M. Clarke, W. Klieber, M. Nováček, P. Zuliani, Model Checking and the State Explosion Problem, Springer Berlin...

H. Don et al.

Slowly synchronizing automata with fixed alphabet size

Inf. Comput.

(2020)

D. Eppstein

Reset sequences for monotonic automata

SIAM J. Comput.

(1990)

Cited by (6)

A model-based deep reinforcement learning approach to the nonblocking coordination of modular supervisors of discrete event systems
2023, Information Sciences
Modular supervisory control may lead to conflicts among the modular supervisors for large-scale discrete event systems. The existing methods for ensuring nonblocking control of modular supervisors either exploit favorable structures in the system model to guarantee the nonblocking property of modular supervisors or employ hierarchical model abstraction methods for reducing the computational complexity of designing a nonblocking coordinator. The nonblocking modular control problem is, in general, NP-hard. This study integrates supervisory control theory and a model-based deep reinforcement learning method to synthesize a nonblocking coordinator for the modular supervisors. The deep reinforcement learning method significantly reduces the computational complexity by avoiding the computation of synchronization of multiple modular supervisors and the plant models. The supervisory control function is approximated by the deep neural network instead of a large-sized finite automaton. Furthermore, the proposed model-based deep reinforcement learning method is more efficient than the standard deep Q network algorithm.
On quadratic lower bounds for deciding resettable finite automata
2024, Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition)
Cellular automaton created as an m-ary product of algebraic quasi-multiautomata
2023, Soft Computing
An Improved Algorithm for Finding the Shortest Synchronizing Words
2022, Leibniz International Proceedings in Informatics, LIPIcs
An Improved Algorithm for Finding the Shortest Synchronizing Words
2022, arXiv
From automata to multiautomata via theory of hypercompositional structures
2022, Mathematics

View full text