A novel learning algorithm for Büchi automata based on family of DFAs and classification trees

https://doi.org/10.1016/j.ic.2020.104678Get rights and content

Abstract

In this paper, we propose a novel algorithm to learn a Büchi automaton from a teacher who knows an ω-regular language. The learned Büchi automaton can be a nondeterministic Büchi automaton or a limit deterministic Büchi automaton. The learning algorithm is based on learning a formalism called family of DFAs (FDFAs) recently proposed by Angluin and Fisman. The main catch is that we use a classification tree structure instead of the standard observation table structure. The worst case storage space required by our algorithm is quadratically better than that required by the table-based algorithm proposed by Angluin and Fisman. We implement the proposed learning algorithms in the learning library ROLL (Regular Omega Language Learning), which also consists of other complete ω-regular learning algorithms available in the literature. Experimental results show that our tree-based learning algorithms have the best performance among others regarding the number of solved learning tasks.

Introduction

In the past two decades, learning-based automata inference techniques [2], [3], [4], [5] have received significant attention from the community of formal verification. In general, the primary applications of automata learning techniques in the community can be categorized into two: improving the efficiency and the scalability of verification techniques [6], [7], [8], [9], [10], [11], [12], [13] and synthesizing abstract system models for further analysis [14], [15], [16], [17], [18], [19], [20], [21], [22], [23].

The former is usually based on the so called assume-guarantee compositional verification approach, which divides a verification task into several subtasks via a composition rule. Learning algorithms are applied to construct environmental assumptions of components in the rule automatically. For the latter, automata learning algorithms have been used to automatically generate interface models of computer programs [17], [18], [19], [20], [24], to extract a model of system error traces for diagnosis purpose [22], to get a behavior model of programs for statistical program analysis [23], and to do model-based testing and verification [14], [15], [16]. Later, Vaandrager [25] explained the concept of model learning used in above applications. In particular, there are some robust libraries for finite automata learning available in the literature, e.g., libalf [26] and LearnLib [27].

Besides the classical finite automata learning algorithms, people have also developed and applied learning algorithms for richer models for the above two applications. For example, learning algorithms for register automata [28], [29] have been developed and applied to synthesizing program interface models. For timed automata, learning algorithms have been developed to automate the compositional verification of timed systems [10] and to verify specifications of the TCP protocol [30]. However, all the above results are for checking safety properties or synthesizing finite behavior models of systems/programs. Büchi automata are a standard model for describing liveness properties of distributed systems [31] and have been widely applied in the automata-based model checking framework [32] to describe properties to be verified as well as in the synthesis of reactive systems [33]. Moreover, Büchi automata have been used as a means to prove the termination of programs [34]. Therefore, in order to verify whether a system satisfies a liveness property with learning algorithms, a learning algorithm for Büchi automata can be employed.

Motivated by that, Maler and Pnueli introduced in [35] the first learning algorithm for Büchi automata, which is, however, only able to learn a strict subclass of ω-regular languages. The first learning algorithm of Büchi automata accepting the complete class of ω-regular languages was described in [36], based on the L algorithm [4] and the result of [37]. However, unlike the case for the finite automata learning, the research on applying Büchi learning algorithms for verification problems is still in its infancy despite the popularity of Büchi automata in the community.

One reason why the learning algorithms for Büchi automata have seldom been used is that the learning algorithms for Büchi automata are currently not efficient enough for model checking. Recently, Angluin and Fisman proposed a learning algorithm in [1] by learning a formalism called family of DFAs (FDFAs), based on the results of [38]. The main barrier of applying their learning algorithm in the verification is that their algorithm requires a teacher for FDFAs. To the best of our knowledge, FDFAs have not yet been applied in the verification while Büchi automata have already been used in several areas such as program termination analysis [39] and probabilistic verification [40]. As a main contribution, in this paper, we show that the FDFA learning algorithm in [1] can be adapted to support Büchi automata teachers.

To further improve the efficiency of Büchi learning algorithms, in this paper we propose a novel learning algorithm of Büchi automata accepting the complete class of ω-regular languages based on FDFAs and a classification tree structure (inspired by the tree-based L algorithm in [3] and the TTT learning algorithm in [41]). In terms of worst case storage space, the space required by our algorithm is quadratically better than that of the table-based algorithm proposed in [1]. We implement our learning algorithm for Büchi automata in the library ROLL [42] (Regular Omega Language Learning, http://iscasmc.ios.ac.cn/roll), which includes all other Büchi automata learning algorithms of the complete class of ω-regular languages available in the literature. We compare the performance of those algorithms using a benchmark of 295 Büchi automata corresponding to all 295 LTL specifications available in Büchi Store [43], as well as 20 Büchi automata whose languages cannot be specified by LTL formulas. Experimental results show that our tree-based algorithms have the best performance among others regarding the number of solved learning tasks.

To summarize, our contribution includes the following. (1) Adapting the algorithm in [1] to support Büchi automata teachers. (2) A novel Büchi automata learning algorithm for the complete class of ω-regular languages based on FDFAs and classification trees. (3) A comprehensive empirical evaluation of all the Büchi automata learning algorithms available in the literature with ROLL.

A previous version of our learning algorithm appeared in [44]. Compared to the previous version, we have added more examples and intuitions about the proposed learning algorithms. For instance, we have added Fig. 2 in order to give the readers an idea of three different types of canonical FDFAs. We have provided detailed proofs and complexity analysis. Many proofs given here are not trivial so we add them in the hope that the reader may benefit from those ideas in their own works.

Another contribution made in this paper is that we extend the learning algorithm for Büchi automata proposed in [44] to a learning algorithm for limit deterministic Büchi automata. Limit deterministic Büchi automata are a new variety of Büchi automata introduced in [40], [45] for qualitative verification of Markov Decision Processes (MDPs). More precisely, our learned limit deterministic Büchi automata have two components, namely the initial component and the accepting component where two components are both deterministic and all accepting states are contained in the accepting component. The nondeterminism only occurs on the transitions from the initial component to the accepting component. We are aware that the same Büchi automata are also defined in [46]. Moreover, limit deterministic Büchi automata are widely used in the program termination analysis according to [39], [47]. Therefore, it is intriguing to see whether we can apply our learning algorithm in probabilistic verification and program analysis and we leave this to future work.

Section snippets

Preliminaries

Let ⊕ be the standard modular arithmetic operator. Let A and B be two sets. We use AB to denote their symmetric difference, i.e., the set (AB)(BA). We use [ij] to denote the set {i,i+1,,j}. Let Σ be a finite non-empty set of letters called alphabet. A word is a finite or infinite sequence w=w[1]w[2] of letters in Σ. We use ϵ to represent an empty word. The set of all finite words is denoted by Σ, and the set of all infinite words, called ω-words, is denoted by Σω. Moreover, we also

Representations of ω-regular languages

The first representation of ω-regular languages introduced here is Büchi automata, which were originally introduced by Julius Richard Büchi in [49] and now have been widely used in model checking. A Büchi automaton (BA) has the same structure as an FA, except that it accepts only infinite words. A run of a BA on an infinite word is defined similarly to that of an FA except that instead of ending in a state, it visits an infinite sequence of states. An infinite word w is accepted by a BA A iff A

Büchi automata learning framework based on FDFAs

We begin with an introduction of the framework on learning a BA (respectively, LDBA) recognizing an unknown ω-regular language L, as depicted in Fig. 3.

Overview of the framework  First, we assume that we already have a BA teacher who knows the unknown ω-regular language L and can answer membership and equivalence queries about L. More precisely, a membership query MemBA(uvω) asks the teacher if uvωL. While an equivalence query EquBA(B) asks whether a BA B accepts L. The BA teacher will answer

Table-based learning algorithm for FDFAs

In this section, we briefly introduce the table-based FDFA learner in [1] under the assumption that we have an FDFA teacher who knows the target FDFA. It employs a structure called observation table [4] to organize the results obtained from queries and to propose candidate FDFAs. The table-based FDFA learner simultaneously runs several instances of DFA learners. The DFA learners are very similar to the L algorithm [4], except that they use different conditions to decide if two strings belong

Tree-based learning algorithm for FDFAs

In this section, we provide our tree-based learning algorithm for the FDFAs under the assumption that we have an FDFA teacher knowing the target FDFA. To that end, we first define the classification tree structure for the FDFA learning in Sect. 6.1 and then present the tree-based learning algorithm in Sect. 6.2.

From FDFAs to Büchi automata

Since the FDFA teacher exploits the BA teacher for answering equivalence queries, the conversion from the conjectured FDFA into a BA is needed. Unfortunately, with the following example, we show that in general it is impossible to construct a precise BA B for an FDFA F such that UP(L(B))=UP(F). Note that this result has been discussed and proved for the first time in the previous version [44] of this paper.

Example 2

Consider a non-canonical FDFA F in Fig. 9, we have UP(F)=n=0{a,b}(abn)ω. We assume

Counterexample analysis for the FDFA teacher

In this section, we first show how to extract valid counterexamples for the FDFA learner from the counterexamples returned from the BA teacher and then give their correctness proofs in Sect. 8.1. Since the counterexample analysis procedure makes use of three DFAs, namely Du$v, D1 and D2 (see Sect. 8.1), we will give the DFA construction for Du$v in Sect. 8.2 and the constructions for D1 and D2 in Sect. 8.3.

Correctness and complexity analysis

In this section, we first discuss the correctness of the tree-based FDFA learning algorithm in Sect 9.1 and then present the complexity of the algorithm in Sect. 9.2. Together with the correctness of the BA construction and counterexample analysis, it follows our main result, i.e., Theorem 4 in Sect. 9.2.

Experimental results

All the learning algorithms proposed in this work are implemented in the ROLL library [42] (http://iscasmc.ios.ac.cn/roll). In the ROLL library, all DFA operations are delegated to the dk.brics.automaton package, and we use the RABIT tool [54], [55] to check the equivalence of two BAs. We evaluate the performance of our learning algorithms using the smallest BAs corresponding to all the 295 LTL specifications available in Büchi Store [43], where the numbers of states in the BAs range from 1 to

Discussion and future works

Regarding our experiments, the BAs used as target automata are in general small; the average size of the input BAs are around 10 states. From our experience of applying DFA learning algorithms, the performance of tree-based algorithms is significantly better than the table-based ones when the number of states of the learned DFA is large, say more than 1000. We believe this will also apply to the case of BA learning. Nevertheless, in our current experiments, most of the time are spent in

Declaration of Competing Interest

None declared.

Acknowledgements

We thank two anonymous reviewers for their valuable suggestions to improve the presentation of this paper. This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61761136011, 61836005, 61532019), the Guangdong Science and Technology Department (Grant No. 2018B010107004), and the MOST project No. 103-2221-E-001-019-MY3.

References (60)

  • Y. Chen et al.

    Learning minimal separating dfa's for compositional verification

  • O. Grumberg et al.

    Learning-Based Compositional Model Checking of Behavioral UML Systems

    (2016)
  • S.-W. Lin et al.

    Learning assumptions for compositionalverification of timed systems

    IEEE Trans. Softw. Eng.

    (2014)
  • R. Alur et al.

    Symbolic compositional verification by learning assumptions

  • L. Feng et al.

    Automated learning of probabilistic assumptions for compositional reasoning

  • F. He et al.

    Leveraging weighted automata in compositional reasoning about concurrent probabilistic systems

  • D.A. Peled et al.

    Black box checking

    J. Autom. Lang. Comb.

    (2002)
  • A. Hagerer et al.

    Model generation by moderated regular extrapolation

  • F. Wang et al.

    Evolving a test oracle in black-box testing

  • R. Alur et al.

    Synthesis of interface specifications for Java classes

  • F. Howar et al.

    Hybrid learning: interface generation through static, dynamic, and symbolic analysis

  • D. Giannakopoulou et al.

    Symbolic learning of component interfaces

  • J. Sun et al.

    TLV: abstraction through testing, learning, and validation

  • D. Giannakopoulou et al.

    Symbolic Learning of Component Interfaces

    (2012)
  • M. Chapman et al.

    Learning the language of error

  • Y. Chen et al.

    PAC learning-based verification and model synthesis

  • H. Xiao et al.

    Tzuyu: learning stateful typestates

  • F.W. Vaandrager

    Model learning

    Commun. ACM

    (2017)
  • B. Bollig et al.

    Libalf: the automata learning framework

  • The open-source learnlib - a framework for active automata learning

  • Cited by (8)

    • Towards Strengthening Formal Specifications with Mutation Model Checking

      2023, ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    • A Novel Family of Finite Automata for Recognizing and Learning ω -Regular Languages

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • FORQ-Based Language Inclusion Formal Testing

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text