A novel learning algorithm for Büchi automata based on family of DFAs and classification trees
Introduction
In the past two decades, learning-based automata inference techniques [2], [3], [4], [5] have received significant attention from the community of formal verification. In general, the primary applications of automata learning techniques in the community can be categorized into two: improving the efficiency and the scalability of verification techniques [6], [7], [8], [9], [10], [11], [12], [13] and synthesizing abstract system models for further analysis [14], [15], [16], [17], [18], [19], [20], [21], [22], [23].
The former is usually based on the so called assume-guarantee compositional verification approach, which divides a verification task into several subtasks via a composition rule. Learning algorithms are applied to construct environmental assumptions of components in the rule automatically. For the latter, automata learning algorithms have been used to automatically generate interface models of computer programs [17], [18], [19], [20], [24], to extract a model of system error traces for diagnosis purpose [22], to get a behavior model of programs for statistical program analysis [23], and to do model-based testing and verification [14], [15], [16]. Later, Vaandrager [25] explained the concept of model learning used in above applications. In particular, there are some robust libraries for finite automata learning available in the literature, e.g., libalf [26] and LearnLib [27].
Besides the classical finite automata learning algorithms, people have also developed and applied learning algorithms for richer models for the above two applications. For example, learning algorithms for register automata [28], [29] have been developed and applied to synthesizing program interface models. For timed automata, learning algorithms have been developed to automate the compositional verification of timed systems [10] and to verify specifications of the TCP protocol [30]. However, all the above results are for checking safety properties or synthesizing finite behavior models of systems/programs. Büchi automata are a standard model for describing liveness properties of distributed systems [31] and have been widely applied in the automata-based model checking framework [32] to describe properties to be verified as well as in the synthesis of reactive systems [33]. Moreover, Büchi automata have been used as a means to prove the termination of programs [34]. Therefore, in order to verify whether a system satisfies a liveness property with learning algorithms, a learning algorithm for Büchi automata can be employed.
Motivated by that, Maler and Pnueli introduced in [35] the first learning algorithm for Büchi automata, which is, however, only able to learn a strict subclass of ω-regular languages. The first learning algorithm of Büchi automata accepting the complete class of ω-regular languages was described in [36], based on the algorithm [4] and the result of [37]. However, unlike the case for the finite automata learning, the research on applying Büchi learning algorithms for verification problems is still in its infancy despite the popularity of Büchi automata in the community.
One reason why the learning algorithms for Büchi automata have seldom been used is that the learning algorithms for Büchi automata are currently not efficient enough for model checking. Recently, Angluin and Fisman proposed a learning algorithm in [1] by learning a formalism called family of DFAs (FDFAs), based on the results of [38]. The main barrier of applying their learning algorithm in the verification is that their algorithm requires a teacher for FDFAs. To the best of our knowledge, FDFAs have not yet been applied in the verification while Büchi automata have already been used in several areas such as program termination analysis [39] and probabilistic verification [40]. As a main contribution, in this paper, we show that the FDFA learning algorithm in [1] can be adapted to support Büchi automata teachers.
To further improve the efficiency of Büchi learning algorithms, in this paper we propose a novel learning algorithm of Büchi automata accepting the complete class of ω-regular languages based on FDFAs and a classification tree structure (inspired by the tree-based algorithm in [3] and the TTT learning algorithm in [41]). In terms of worst case storage space, the space required by our algorithm is quadratically better than that of the table-based algorithm proposed in [1]. We implement our learning algorithm for Büchi automata in the library ROLL [42] (Regular Omega Language Learning, http://iscasmc.ios.ac.cn/roll), which includes all other Büchi automata learning algorithms of the complete class of ω-regular languages available in the literature. We compare the performance of those algorithms using a benchmark of 295 Büchi automata corresponding to all 295 LTL specifications available in Büchi Store [43], as well as 20 Büchi automata whose languages cannot be specified by LTL formulas. Experimental results show that our tree-based algorithms have the best performance among others regarding the number of solved learning tasks.
To summarize, our contribution includes the following. (1) Adapting the algorithm in [1] to support Büchi automata teachers. (2) A novel Büchi automata learning algorithm for the complete class of ω-regular languages based on FDFAs and classification trees. (3) A comprehensive empirical evaluation of all the Büchi automata learning algorithms available in the literature with ROLL.
A previous version of our learning algorithm appeared in [44]. Compared to the previous version, we have added more examples and intuitions about the proposed learning algorithms. For instance, we have added Fig. 2 in order to give the readers an idea of three different types of canonical FDFAs. We have provided detailed proofs and complexity analysis. Many proofs given here are not trivial so we add them in the hope that the reader may benefit from those ideas in their own works.
Another contribution made in this paper is that we extend the learning algorithm for Büchi automata proposed in [44] to a learning algorithm for limit deterministic Büchi automata. Limit deterministic Büchi automata are a new variety of Büchi automata introduced in [40], [45] for qualitative verification of Markov Decision Processes (MDPs). More precisely, our learned limit deterministic Büchi automata have two components, namely the initial component and the accepting component where two components are both deterministic and all accepting states are contained in the accepting component. The nondeterminism only occurs on the transitions from the initial component to the accepting component. We are aware that the same Büchi automata are also defined in [46]. Moreover, limit deterministic Büchi automata are widely used in the program termination analysis according to [39], [47]. Therefore, it is intriguing to see whether we can apply our learning algorithm in probabilistic verification and program analysis and we leave this to future work.
Section snippets
Preliminaries
Let ⊕ be the standard modular arithmetic operator. Let A and B be two sets. We use to denote their symmetric difference, i.e., the set . We use to denote the set . Let Σ be a finite non-empty set of letters called alphabet. A word is a finite or infinite sequence of letters in Σ. We use ϵ to represent an empty word. The set of all finite words is denoted by , and the set of all infinite words, called ω-words, is denoted by . Moreover, we also
Representations of ω-regular languages
The first representation of ω-regular languages introduced here is Büchi automata, which were originally introduced by Julius Richard Büchi in [49] and now have been widely used in model checking. A Büchi automaton (BA) has the same structure as an FA, except that it accepts only infinite words. A run of a BA on an infinite word is defined similarly to that of an FA except that instead of ending in a state, it visits an infinite sequence of states. An infinite word w is accepted by a BA A iff A
Büchi automata learning framework based on FDFAs
We begin with an introduction of the framework on learning a BA (respectively, LDBA) recognizing an unknown ω-regular language L, as depicted in Fig. 3.
Overview of the framework First, we assume that we already have a BA teacher who knows the unknown ω-regular language L and can answer membership and equivalence queries about L. More precisely, a membership query asks the teacher if . While an equivalence query asks whether a BA B accepts L. The BA teacher will answer
Table-based learning algorithm for FDFAs
In this section, we briefly introduce the table-based FDFA learner in [1] under the assumption that we have an FDFA teacher who knows the target FDFA. It employs a structure called observation table [4] to organize the results obtained from queries and to propose candidate FDFAs. The table-based FDFA learner simultaneously runs several instances of DFA learners. The DFA learners are very similar to the algorithm [4], except that they use different conditions to decide if two strings belong
Tree-based learning algorithm for FDFAs
In this section, we provide our tree-based learning algorithm for the FDFAs under the assumption that we have an FDFA teacher knowing the target FDFA. To that end, we first define the classification tree structure for the FDFA learning in Sect. 6.1 and then present the tree-based learning algorithm in Sect. 6.2.
From FDFAs to Büchi automata
Since the FDFA teacher exploits the BA teacher for answering equivalence queries, the conversion from the conjectured FDFA into a BA is needed. Unfortunately, with the following example, we show that in general it is impossible to construct a precise BA B for an FDFA such that . Note that this result has been discussed and proved for the first time in the previous version [44] of this paper.
Example 2 Consider a non-canonical FDFA in Fig. 9, we have . We assume
Counterexample analysis for the FDFA teacher
In this section, we first show how to extract valid counterexamples for the FDFA learner from the counterexamples returned from the BA teacher and then give their correctness proofs in Sect. 8.1. Since the counterexample analysis procedure makes use of three DFAs, namely , and (see Sect. 8.1), we will give the DFA construction for in Sect. 8.2 and the constructions for and in Sect. 8.3.
Correctness and complexity analysis
In this section, we first discuss the correctness of the tree-based FDFA learning algorithm in Sect 9.1 and then present the complexity of the algorithm in Sect. 9.2. Together with the correctness of the BA construction and counterexample analysis, it follows our main result, i.e., Theorem 4 in Sect. 9.2.
Experimental results
All the learning algorithms proposed in this work are implemented in the library [42] (http://iscasmc.ios.ac.cn/roll). In the library, all DFA operations are delegated to the dk.brics.automaton package, and we use the RABIT tool [54], [55] to check the equivalence of two BAs. We evaluate the performance of our learning algorithms using the smallest BAs corresponding to all the 295 LTL specifications available in Büchi Store [43], where the numbers of states in the BAs range from 1 to
Discussion and future works
Regarding our experiments, the BAs used as target automata are in general small; the average size of the input BAs are around 10 states. From our experience of applying DFA learning algorithms, the performance of tree-based algorithms is significantly better than the table-based ones when the number of states of the learned DFA is large, say more than 1000. We believe this will also apply to the case of BA learning. Nevertheless, in our current experiments, most of the time are spent in
Declaration of Competing Interest
None declared.
Acknowledgements
We thank two anonymous reviewers for their valuable suggestions to improve the presentation of this paper. This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61761136011, 61836005, 61532019), the Guangdong Science and Technology Department (Grant No. 2018B010107004), and the MOST project No. 103-2221-E-001-019-MY3.
References (60)
- et al.
Learning regular omega languages
Theor. Comput. Sci.
(2016) Learning regular sets from queries and counterexamples
Inf. Comput.
(1987)- et al.
On the learnability of infinitary regular sets
Inf. Comput.
(1995) A syntactic congruence for rational omega-language
Theor. Comput. Sci.
(1985)Automata on infinite objects
- et al.
Angluin-style learning of NFA
- et al.
An Introduction to Computational Learning Theory
(1994) - et al.
Inference of finite automata using homing sequences (extended abstract)
- et al.
Learning assumptions for compositional verification
- et al.
Automated assume-guarantee reasoning for simulation conformance
Learning minimal separating dfa's for compositional verification
Learning-Based Compositional Model Checking of Behavioral UML Systems
Learning assumptions for compositionalverification of timed systems
IEEE Trans. Softw. Eng.
Symbolic compositional verification by learning assumptions
Automated learning of probabilistic assumptions for compositional reasoning
Leveraging weighted automata in compositional reasoning about concurrent probabilistic systems
Black box checking
J. Autom. Lang. Comb.
Model generation by moderated regular extrapolation
Evolving a test oracle in black-box testing
Synthesis of interface specifications for Java classes
Hybrid learning: interface generation through static, dynamic, and symbolic analysis
Symbolic learning of component interfaces
TLV: abstraction through testing, learning, and validation
Symbolic Learning of Component Interfaces
Learning the language of error
PAC learning-based verification and model synthesis
Tzuyu: learning stateful typestates
Model learning
Commun. ACM
Libalf: the automata learning framework
The open-source learnlib - a framework for active automata learning
Cited by (8)
On the power of finite ambiguity in Büchi complementation
2023, Information and ComputationTowards Strengthening Formal Specifications with Mutation Model Checking
2023, ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software EngineeringA Novel Family of Finite Automata for Recognizing and Learning ω -Regular Languages
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)FORQ-based Language Inclusion Formal Testing
2022, arXivFORQ-Based Language Inclusion Formal Testing
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)