Preimage problems for deterministic finite automata

https://doi.org/10.1016/j.jcss.2020.08.002Get rights and content

Abstract

Given a subset of states S of a deterministic finite automaton and a word w, the preimage is the subset of all states mapped to a state in S by the action of w. We study three natural problems concerning words giving certain preimages. The first problem is whether, for a given subset, there exists a word extending the subset (giving a larger preimage). The second problem is whether there exists a totally extending word (giving the whole set of states as a preimage)—equivalently, whether there exists an avoiding word for the complementary subset. The third problem is whether there exists a resizing word. We also consider variants where the length of the word is upper bounded, where the size of the given subset is restricted, and where the automaton is strongly connected, synchronizing, or binary. We conclude with a summary of the complexities in all combinations of the cases.

Introduction

A deterministic finite complete (semi)automaton

is a triple (Q,Σ,δ), where Q is the set of states, Σ is the input alphabet, and δ:Q×ΣQ is the transition function. We extend δ to a function Q×ΣQ in the usual way. Throughout the paper, by n we always denote the number of states |Q|.

When the context is clear, given a state qQ and a word wΣ, we write shortly qw for δ(q,w). Given a subset SQ, the image of S under the action of a word wΣ is Sw=δ(S,w)={qw|qS}. The preimage is Sw1=δ1(S,w)={qQ|qwS}. If S={q}, then we usually simply write qw1.

We say that a word w compresses a subset S if |Sw|<|S|, avoids S if (Qw)S=, extends S if |Sw1|>|S|, and totally extends S if Sw1=Q. A subset S is compressible, avoidable, extensible, and totally extensible, if there is a word that, respectively, compresses, avoids, extends and totally extends it.

Remark 1

A word wΣ is avoiding for SQ if and only if w is totally extending for QS.

Fig. 1 shows an example automaton. For S={2,3}, the shortest compressing word is aab, and we have {2,3}aab={1}, while the shortest extending word is ba, and we have {2,3}(ba)1={1,2}b1={1,2,4}.

Note that the preimage of a subset under the action of a word can be smaller than the subset. In this case, we say that a word shrinks the subset (not to be confused with compressing when the image is considered). For example, in Fig. 1, subset {3,4} is shrank by b to subset {4}.

Note that shrinking a subset is equivalent to extending its complement. Similarly, a word totally extending a subset also shrinks its complement to the empty set.

Remark 2

|Sw1|>|S| if and only if |(QS)w1|<|QS|, and Sw1=Q if and only if (QS)w1=.

Therefore, avoiding a subset is equivalent to shrinking it to the empty set.

The rank of a word w is the cardinality of the image Qw. A word of rank 1 is called reset or synchronizing, and an automaton that admits a reset word is called synchronizing. Also, for a subset SQ, we say that a word wΣ such that |Sw|=1 synchronizes S.

Synchronizing automata serve as transparent and natural models of various systems in many applications in different fields, e.g., in coding theory [2], [3], model testing of reactive systems [4], robotics [5], and biocomputing [6]. They also reveal interesting connections with many parts of mathematics. For example, some of the recent works involve group theory [7], representation theory [8], computational complexity [9], optimization and convex geometry [10], regular languages and universality [11], approximability [12], primitive sets of matrices [13], and graph theory [14]. For a brief introduction to the theory of synchronizing automata we refer the reader to an excellent, though quite outdated, survey [15].

The famous Černý conjecture [16], which was formally stated in 1969 during a conference [15], is one of the most longstanding open problems in automata theory. It states that a synchronizing automaton has a reset word of length at most (n1)2. The currently best upper bound is cubic and has been improved recently [17] (cf. [18]). Besides the conjecture, algorithmic issues are also important. Unfortunately, the problem of finding a shortest reset word is computationally hard [19], [9], and also its length approximation remains hard [12]. We also refer to surveys [4], [15] dealing with algorithmic issues and the Černý conjecture.

Compressing and extending a subset in general play a crucial role in the synchronization of automata and related areas. In fact, all known algorithms finding a reset word use finding words that either compresses or extends a subset as subprocedures (e.g. [20], [21], [19], [22], [23]). Moreover, probably all proofs of upper bounds on the length of the shortest reset words use bounds on the length of words that compress (e.g. [20], [24], [21], [25], [19], [26], [18], [27], [28]) or extend (e.g. [29], [30], [21], [31], [32], [33], [18]) some subsets.

In this paper, we study several problems about finding a word yielding a certain preimage. We provide a systematic view of their computational complexity in various combinations of cases.

The complexities of problems related to images of a subset have been well studied. It is known that given an automaton

and a subset SQ, determining whether there is a word that synchronizes it is PSPACE-complete [34]. The same holds even for strongly connected binary automata [35].

On the other hand, checking whether the automaton is synchronizing, i.e. whether there is a word that synchronizes Q, can be solved in O(|Σ|n2) time and space [16], [19], [15] and in O(n) average time and space when the automaton is randomly chosen [36]. To this end, we verify whether all pairs of states are compressible. Using the same algorithm, we can determine whether a given subset is compressible.

Deciding whether there exists a synchronizing word of a given length is NP-complete [19] (cf. [9] for the complexity of the corresponding functional problems), even if the given automaton is binary. The NP-completeness holds even when the automaton is Eulerian and binary [37], which immediately implies that for the class of strongly connected automata the complexity is the same.

However, deciding whether there exists a word of a given length that only compresses a subset still can be solved in O(|Σ|n2) time, as for every pair of states we can compute a shortest word that compresses the pair.

The problems related to images have been also studied in other settings for both complexity and the bounds on the length of the shortest words, for example, in the case of a nondeterministic automaton [34], in the case of a partial deterministic finite automaton [38], in the partial observability setting for various kinds of automata [39], and for the reachability of a given subset in the case of a deterministic finite automaton [40], [41].

In contrast to the problems related to images (compression), the complexity of the problems related to preimages has not been thoroughly studied in the literature. In the paper, we fill this gap and give a comprehensive analysis of all basic cases. We study three families of problems. As noted before, extending is equivalent to shrinking the complementary subset, hence we need to deal only with the extending word problems. Similarly, totally extending words are equivalent to avoiding the complement, thus we do not need to consider avoiding a set of states separately.

Extending words: Our first family of problems is the question whether there exists an extending word (Problem 1, Problem 3, Problem 5, Problem 7, Problem 9, Problem 12 in this paper).

This is motivated by the fact that finding such a word is the basic step of the so-called extension method of finding a reset word, which is used in many proofs and also some algorithms. The extension method of finding a reset word is as follows: we start from some singleton S0={q} and iteratively find extending words w1,,wk such that |S0w11wi1|>|S0w11wi11| for 1ik, and where S0w11wk1=Q. For finding a short reset word one needs to bound the lengths of the extending words. For instance, in the case of synchronizing Eulerian automata, the fact that there always exists an extending word of length at most n1 implies the upper bound (n2)(n1)+1 on the length of the shortest reset words for this class [32] (the first extending step requires just one letter, as we can choose an arbitrary singleton). In this case, a polynomial algorithm for finding extending words has been proposed [21].

Totally extending words and avoiding: We study the problem whether there exists a totally extending word (Problem 2, Problem 4, Problem 6, Problem 8, Problem 10, Problem 13 in this paper). The question of the existence of a totally extending word is equivalent to the question of the existence of an avoiding word for the complementary subset.

Totally extending words themselves can be viewed as a generalization of reset words: a word totally extending a singleton to the whole set of states Q is a reset word. If we are not interested in bringing the automaton into one particular state but want it to be in any of the states from a specified subset, then it is exactly the question about totally extending word for our subset. In view of applications of synchronization, this can be particularly useful when we deal with non-synchronizing automata, where reset words cannot be applied.

Avoiding word problem is a recent concept that is dual to synchronization: instead of being in some states, we want not to be in them. A quadratic upper bound on the length of the shortest avoiding words of a single state has been established [18], which led to an improvement of the best known upper bound on the length of the shortest reset words (see also [17] for a very recent improvement of that improvement of the upper bound). Furthermore, better upper bounds on the length of the shortest avoiding words would lead to further improvements; in particular, a subquadratic upper bound implies the upper bound on the reset threshold equal to 7n3/48+o(n3) [42]. There is a precise conjecture that the shortest avoiding words have length at most 2n2 [18, Open Problem 1]. The computational complexity of the problems related to avoiding, both a single state or a subset, has not been established before. We give a special attention to the problem of avoiding one state and a small subset of states (totally extending a large subset), as since they seem to be most important in view of their applications (and as we show, the complexity grows with the size of the subset to avoid).

Resizing: Shrinking a subset is dual to extending, i.e. shrinking a subset means extending its complement. Therefore, the complexity immediately transfers from the previous results. However, in Section 5 we consider the problem of determining whether there is a word whose inverse action results in a subset having a different size, that is, either extends the subset or shrinks it (Problem 15, Problem 16).

Interestingly, in contrast with the computationally difficult problems of finding a word that extends the subset and finding a word that shrinks the subset, for this variant there exists a polynomial algorithm finding a shortest resizing word in all cases.

We can mention that in some cases extending and shrinking words are related, and it may be enough to find either one. For instance, this is used in the so-called averaging trick, which appears in several proofs [21], [31], [32], [43].

Summary: For all the problems we consider the subclasses of strongly connected, synchronizing, and binary automata. Also, we consider the problems where an upper bound on the length of the word is additionally given in a binary form in the input. Since, in most cases, the problems are computationally hard, in Section 3 and Section 4, we consider the complexity parameterized by the size of the given subset.

Table 1 and Table 2 summarize our results together with known results about compressing words. For the cases where a polynomial algorithm exists, we put the time complexity of the best one known. All the hardness results hold also in the case of a binary alphabet.

Section snippets

Unbounded word length

In the first studied case, we do not have any restriction on the given subset S neither on the length of the extending word. We deal with the following problems:

Problem 1 Extensible subset

Given

and a subset SQ, is S extensible?

Problem 2 Totally extensible subset

Given

and a subset SQ, is S totally extensible?

Theorem 3

Problem 1 and Problem 2 are PSPACE-complete, even if

is strongly connected.

Proof

To solve one of the problems in NPSPACE, we guess the length of a word w with the required property, and then guess the letters of w from the end.

Extending small subsets

The complexity of the extending problems is caused by an unbounded size of the given subset. Note that in the proof of PSPACE-hardness in Theorem 3 the used subsets and simultaneously their complements may grow with an instance of the reduced problem, and it is known that the problem of the emptiness of intersection can be solved in polynomial time if the number of given DFAs is fixed. Here, we study the computational complexity of the extending problems when the size of the subset is not

Extending large subsets

In this section, we consider the case where the subset S contains all except at most a fixed number of states k.

Resizing a subset

In this section we deal with the following two problems:

Problem 15 Resizable subset

Given an automaton

and a subset SQ, is S resizable?

Problem 16 Resizable subset by short word

Given an automaton

, a subset SQ, and an integer given in binary representation, is S resizable by a word of length at most ?

In contrast to the cases |Sw1|>|S| and |Sw1|<|S|, there exists a polynomial-time algorithm for both these problems. Furthermore, we prove that if S is resizable, then the length of the shortest resizing words is at most n1.

To obtain a

Conclusions

We have established the computational complexity of problems related to extending words. Indirectly, our results about the complexity imply also the bounds on the length of the shortest compressing/extending words, which are of separate interest. In particular, PSPACE-hardness implies that the shortest words can be exponentially long in this case, and polynomial deterministic or nondeterministic algorithms in our proofs imply polynomial upper bounds. For example, the question about the length

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank the anonymous referee for careful reading and detailed comments. This work was supported by the Competitiveness Enhancement Program of Ural Federal University under grant No. 02.A03.21.006 (Mikhail Berlinkov), and by the National Science Centre, Poland under project number 2014/15/B/ST6/00615 (Robert Ferens) and 2017/25/B/ST6/01920 (Marek Szykuła).

References (49)

  • I.K. Rystsov

    Polynomial complete problems in automata theory

    Inf. Process. Lett.

    (1983)
  • V. Vorel

    Complexity of a problem concerning reset words for Eulerian binary automata

    Inf. Comput.

    (2017)
  • M.V. Berlinkov et al.

    Complexity of preimage problems for deterministic finite automata

  • J. Berstel et al.

    Codes and Automata

    (2009)
  • S. Sandberg

    Homing and synchronizing sequences

  • B.K. Natarajan

    An algorithmic approach to the automated design of parts orienters

  • Y. Benenson et al.

    DNA molecule provides a computing machine with both data and fuel

    Proc. Natl. Acad. Sci. USA

    (2003)
  • J. Araújo et al.

    Between primitive and 2-transitive: synchronization and its friends

    EMS Surv. Math. Sci.

    (2017)
  • J. Almeida et al.

    Representation theory of finite semigroups, semigroup radicals and formal language theory

    Trans. Am. Math. Soc.

    (2009)
  • J. Olschewski et al.

    The complexity of finding reset words in finite automata

  • F. Gonze et al.

    On the synchronizing probability function and the triple rendezvous time for synchronizing automata

    SIAM J. Discrete Math.

    (2016)
  • N. Rampersad et al.

    The computational complexity of universality problems for prefixes, suffixes, factors, and subwords of regular languages

    Fundam. Inform.

    (2012)
  • P. Gawrychowski et al.

    Strong inapproximability of the shortest reset word

  • M.V. Volkov

    Synchronizing automata and the Černý conjecture

  • A preliminary version of this work without most of the proofs was announced in [1].

    View full text