Detecting useless transitions in pushdown automata

https://doi.org/10.1016/j.ic.2020.104612Get rights and content

Abstract

Pushdown automata may contain transitions that are never used in any accepting run of the automaton. We present an algorithm for detecting such useless transitions. A finite automaton that captures the possible stack content during runs of the pushdown automaton, is first constructed in a forward procedure to determine which transitions are reachable, and then employed in a backward procedure to determine which of these transitions can lead to a final state. An implementation of the algorithm is shown to exhibit a favorable performance.

Introduction

Context-free languages are used in language specification, parsing, and code optimization. They are defined by means of a context-free grammar or a pushdown automaton (PDA). Some languages can be specified more efficiently by a PDA than by a context-free grammar, as shown by Goldstine, Price, and Wotschke [6]. PDAs are at the root of deterministic parsers for context-free languages (notably LL, LR), see e.g. [13], [1]. We consider PDAs in which any number of symbols can be popped from as well as pushed onto the stack in one transition. Popping zero or multiple symbols are useful in bottom-up parsing, and facilitate the reversal of a PDA.

For context-free grammars, it is rather straightforward to determine whether a production is useless, i.e., cannot occur in a derivation from the start variable to a string of terminal symbols; such a method is discussed in many textbooks on formal languages (e.g., [12, Theorem 6.2]). It consists of two parts: detect which variables are reachable from the start variable, and which variables can be transformed into a string of terminal symbols. Productions that contain a useless variable, not satisfying these two properties, can be removed from the grammar without changing the associated language. A grammar generated from a program sometimes consists almost entirely of useless productions, such as in case of “parsing by intersection” [10].

This paper addresses the research question posed in [10] to develop an efficient algorithm for determining the useless transitions in a PDA, meaning that no run of the PDA from the initial configuration to a final state includes this transition. Such a transition can be removed from the PDA without changing the language accepted by the PDA, and improves the performance of running the PDA. This is especially sensible if the PDA has been generated automatically, because then there tend to be a substantial number of useless transitions. Similar to detecting useless variables in context-free grammars, our algorithm for detecting useless transitions in a PDA consists of two parts. It stays entirely in the realm of automata. The first part finds which transitions are not reachable from the initial configuration. Here we exploit an algorithm by Finkel, Willems and Wolper [5] to construct a finite automaton (NFA) that captures exactly all possible stacks in the reachable configurations of a PDA. Their approach is modified slightly to take into account that multiple symbols may be popped from the stack at once. The second part of our algorithm, which to the best of our knowledge is novel, finds after which transitions it is impossible to reach a final state. Here we use the NFA constructed in the first part to compute in a backward fashion which transitions can lead to a final state in the PDA.

We prove that the algorithm marks exactly the useless transitions. The worst-case time complexity of the algorithm is O(Q4T), with Q the number of states and T the number of transitions of the PDA. This worst case actually only occurs in the unlikely case that the NFA is constructed over a large number of iterations, is saturated with ε-transitions, and contains a lot of backward nondeterminism. A prototype implementation of the algorithm exhibits a good performance.

An alternative approach is to use the functions post⁎ and pre⁎, to compute the reachable configurations as well as the configurations from which a final state can be reached. This alternative approach was also implemented, and it was shown on a large test set of randomly generated PDAs that the algorithm presented in this paper has a much better performance that this alternative approach.

An earlier version of this paper appeared as [3]. In comparison to that paper, the current paper contains the correctness proofs of our algorithm, discusses the alternative algorithms based on post⁎ and pre⁎ in depth, and presents a more thorough experimental comparison of the algorithms.

Related work  Bouajjani, Esparza and Maler [2] employed a method similar to the one in [5] to capture the reachable configurations of a PDA via an NFA, in the context of model checking infinite-state systems. Griffin [9] showed how to detect which transitions are reachable from the initial configuration in a deterministic PDA (DPDA). For each transition, the algorithm creates a temporary DPDA in which the successive state of the transition is set to a new, final state; all other states in the temporary DPDA are made non-final. Then it is checked whether the language generated by the DPDA is empty; if it is, the transition is unreachable. This algorithm determines which transitions are reachable from the initial configuration, but not which transitions can lead to a final state. Vice versa, Kutrib and Malcher [11] studied reversibility of DPDAs.

Goldstine, Price and Wotschke developed algorithms to optimize a PDA. If for an application it is preferable to have fewer states and more stack symbols, the number of states can be reduced at the price of extra stack symbols [7]. On the other hand, if it is better to have few stack symbols, they can be exchanged for extra states [8]. Pólach, Trávníček, Janoušek and Melichar [14] developed a mechanism to keep track of the symbols that can appear at the top of the stack for each state in a PDA, which they use for a determinization of the PDA.

Section snippets

Preliminaries

A nondeterministic pushdown automaton (PDA) is a 6-tuple (Q,Σ,Γ,δ,q0,F) with Q a finite set of states, Σ a finite input alphabet, Γ a finite stack alphabet, δ:Q×(Σ{ε})×Γ2Q×Γ a finite transition relation, q0 the initial state, and F the set of final states. Let ε denote the empty string. Note that zero or multiple symbols can be popped from the stack in one transition. It is assumed that the initial stack is empty. (An arbitrary initial stack σ can be constructed by adding a new initial

Detecting the useless transitions in a PDA

Our algorithm for detecting useless transitions in a PDA summarizes all reachable configurations of the PDA in an NFA. As a first step, an NFA is constructed that accepts the stacks that can occur during any run of the PDA. A second step determines which transitions can lead to a configuration from which a final state can be reached. Transitions that cannot be reached from the initial state (as determined in step 1) or that cannot lead to a final state (as determined in step 2) are useless.

Alternative approaches

We discuss three alternative approaches to detect useless transitions in PDAs.

One approach is to transform the PDA under consideration into an equivalent context-free grammar, and then determine the useless productions. Disadvantage here is that the resulting grammar tends to be much larger than the original PDA.

A second approach is to check for each transition separately whether it is useless: provide the transition with a special input symbol ξ, all other transitions in the PDA with empty

Implementation and performance comparison

We assessed an implementation of our algorithm with a test suite of randomly generated PDAs. The only real-world PDA, with 294 transitions, was obtained from the grammar of the programming language C. This resulted in an NFA with 339 states and 1030 transitions, of which 695 ε-transitions, and took 3.56 seconds on a 2 GHz processor.

Achieving this performance required two optimizations, both limiting the influence of ε-transitions. The first concerns determining the set of states leading to q in

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Javier Esparza proposed the alternative approach using pre⁎ and post⁎. Jörg Endrullis provided a useful suggestion for the efficient implementation of this alternative approach.

References (15)

There are more references available in the full text version of this article.
View full text