Minimal consistent DFA from sample strings

Zhang, Chenyi

doi:10.1007/s00236-020-00365-8

Minimal consistent DFA from sample strings

Original Article
Published: 06 May 2020

Volume 57, pages 657–670, (2020)
Cite this article

Acta Informatica Aims and scope Submit manuscript

Chenyi Zhang ORCID: orcid.org/0000-0002-3054-5883^nAff1

432 Accesses
Explore all metrics

Abstract

We review the grammatical inference problem for regular languages which aims to generate a deterministic finite automaton from a representative set of training sample strings known to be in or not in the language. Although the general problem of producing a minimal DFA consistent with a given sample is known to be NP-hard, it is possible to generate minimal consistent DFA in polynomial time if certain constraints are satisfied by the given samples. In this work we propose a new algorithm which generates minimal DFA if the given training samples satisfy a certain sufficient condition. On the negative side, we also show that this problem is indeed hard, such that even for a more restricted class of training sets, the problem of generating minimal consistent DFA is already intractable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents

Article Open access 15 April 2024

A Survey on Adversarial Domain Adaptation

Article Open access 13 August 2022

Causal Structure Learning: A Combinatorial Perspective

Article Open access 01 August 2022

Notes

Although the general results seem negative, Gold has also proved that from a given regular language, it is possible to generate a characteristic sample set S, such that learning from that training set yields a minimal DFA in polynomial time.
Recall that Myhill–Nerode theorem defines equivalence classes on members of a language \(L\subseteq {\varSigma }^*\). If \(w_1,w_2\in {\varSigma }^*\) are defined as equivalent, then for all \(w\in {\varSigma }^*\), \(w_1\cdot w\) and \(w_2\cdot w\) are either both accepted by the language or both rejected by the language. The number of such equivalence classes for members of a regular language is always finite, which is equivalent to the number of states in a minimal DFA.
This is similar to [5], in which it is shown that there may be multiple minimal cover-automata that are consistent with a given finite input sample. Also in the original TB algorithm, the construction of \(\delta _M(w,a)\) is nondeterministic as if \(w\cdot a\in S^+\setminus {\texttt {red}}\), the choice of \(w'\) is any compatible state in \({\texttt {red}}\).
Philosophically, one may argue that when studying an automaton it is more reasonable to assume that the acceptability of all prefixes of a word should be observed.
Being maximal means when compared with others there needs to have semantic evidences, either by showing it is larger or smaller by containment, or neither larger nor smaller (i.e., incomparable by an evidence trace). The term semantic-completeness enforces that such a measure always exists at least for the maximal elements.

References

Angluin, D.: On the complexity of minimum inference of regular sets. Inf. Control 39, 337–350 (1978)
Article MathSciNet Google Scholar
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75, 87–106 (1987)
Article MathSciNet Google Scholar
Angluin, D., Smith, C.H.: Inductive inference: theory and methods. ACM Comput. Surv. 15(3), 237–269 (1983)
Article MathSciNet Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Inf. Process. Lett. 24(6), 377–380 (1987)
Article MathSciNet Google Scholar
Câmpeanu, C., Sântean, N., Yu, S.: Minimal cover-automata for finite languages. Theor. Comput. Sci. 267, 3–16 (2001)
Article MathSciNet Google Scholar
de la Higuera, C.: Characteristic sets for polynomial grammatical inference. Mach. Learn. 27(2), 125–138 (1997)
Article Google Scholar
de la Higuera, C., Oncina, J., Vidal, E.: Identification of DFA: data-dependent versus data-independent algorithms. In: Proceedings of International Colloquium on Grammatical Inference (ICGI’96), pp. 313–325 (1996)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York (1979)
MATH Google Scholar
García, P., López, D., Vázquez de Parga, M.: Polynomial characteristic sets for DFA identification. Theor. Comput. Sci. 448, 41–46 (2012)
Article MathSciNet Google Scholar
Gold, M.E.: Language identification in the limit. Inf. Control 10, 447–474 (1967)
Article MathSciNet Google Scholar
Gold, M.E.: Complexity of automaton identificaton from given data. Inf. Control 37, 302–320 (1978)
Article Google Scholar
Gentilini, R., Piazza, C., Policriti, A.: From bisimulation to simulation: coarsest partition problems. J. Autom. Reason. 31(1), 73–103 (2003)
Article MathSciNet Google Scholar
Juillé, H., Pollack, J.B.: A stochastic search approach to grammar induction. In: Proceedings of International Colloquium on Grammatical Inference (ICGI’98), pp. 126–137 (1998)
Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In: Proceedings of International Colloquium on Grammatical Inference (ICGI’98), pp. 1–12 (1998)
Oncina, J., García, P.: Inferring regular languages in polynomial update time. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) Pattern Recognition and Image Analysis, vol. 1, pp. 49–61. World Scientific, Singapore (1992)
Chapter Google Scholar
Parekh, R., Honavar, V.: Learning DFA from simple examples. Mach. Learn. 44(1–2), 9–35 (2001)
Article Google Scholar
Pitt, L., Warmuth, M.K.: The minimum consistent DFA problem cannot be approximated within any polynomial. J. ACM 40(1), 95–142 (1993)
Article MathSciNet Google Scholar
Trakhtenbrot, B.A., Barzdin, Y.M.: Finite Automata; Behavior and Synthesis (Fundamental Studies in Computer Science). North-Holland Publishing Co., Amsterdam (1973)
Google Scholar
Valiant, L.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)
Article Google Scholar
van Glabbeek, R.: The linear time-branching time spectrum I. The semantics of concrete, sequential processes. In: Bergstra, J.A., Ponse, A., Smolka, S.A. (eds.) Handbook of Process Algebra, pp. 3–99. Elsevier, Amsterdam (2001)
Chapter Google Scholar
van Glabbeek, R., Ploeger, B.: Correcting a space-efficient simulation algorithm. In: Proceedings of International Conference on Computer Aided Verification (CAV’08), pp. 517–529 (2008)
Vázquez de Parga, M., García, P., López, D.: Minimal consistent DFA revisited. Theor. Comput. Sci. 647, 43–49 (2016)
Article MathSciNet Google Scholar

Download references

Author information

Chenyi Zhang
Present address: Jinan University, Guangzhou, China

Authors and Affiliations

Authors

Chenyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenyi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C. Minimal consistent DFA from sample strings. Acta Informatica 57, 657–670 (2020). https://doi.org/10.1007/s00236-020-00365-8

Download citation

Received: 15 April 2019
Accepted: 14 January 2020
Published: 06 May 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00236-020-00365-8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimal consistent DFA from sample strings

Abstract

Access this article

Similar content being viewed by others

Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents

A Survey on Adversarial Domain Adaptation

Causal Structure Learning: A Combinatorial Perspective

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

Minimal consistent DFA from sample strings

Abstract

Access this article

Similar content being viewed by others

Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents

A Survey on Adversarial Domain Adaptation

Causal Structure Learning: A Combinatorial Perspective

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation