Ordering regular languages: a danger zone,arXiv - CS - Formal Languages and Automata Theory

当前位置： X-MOL 学术 › arXiv.cs.FL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Ordering regular languages: a danger zone
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2021-06-01 , DOI: arxiv-2106.00315
Giovanna D'Agostino, Davide Martincigh, Alberto Policriti

Ordering the collection of states of a given automaton starting from an order of the underlying alphabet is a natural move towards a computational treatment of the language accepted by the automaton. Along this path, Wheeler \emph{graphs} have been recently introduced as an extension/adaptation of the Burrows-Wheeler Transform (the now famous BWT, originally defined on strings) to graphs. These graphs constitute an important data-structure for languages, since they allow a very efficient storage mechanism for the transition function of an automaton, while providing a fast support to all sorts of substring queries. This is possible as a consequence of a property -- the so-called \emph{path coherence} -- valid on Wheeler graphs and consisting in an ordering on nodes that "propagates" to (collections of) strings. By looking at a Wheeler graph as an automaton, the ordering on strings corresponds to the co-lexicographic order of the words entering each state. This leads naturally to consider the class of regular languages accepted by Wheeler automata, i.e. the Wheeler languages. It has been shown that, as opposed to the general case, the classic determinization by powerset construction is polynomial on Wheeler languages. As a consequence, most of the classical problems turn out to be "easy" -- that is, solvable in polynomial time -- on Wheeler languages. Moreover, deciding whether a DFA is Wheeler and deciding whether a DFA accepts a Wheeler language is polynomial. Our contribution here is to put an upper bound to easy problems. For instance, whenever we generalize by switching to general NFAs or by not fixing an order of the underlying alphabet, the above mentioned problems become "hard" -- that is NP-complete or even PSPACE-complete.

中文翻译：

订购常规语言：危险区域

从底层字母表的顺序开始对给定自动机的状态集合进行排序是朝着自动机接受的语言的计算处理的自然转变。沿着这条路径，Wheeler \emph{graphs} 最近被引入作为 Burrows-Wheeler 变换（现在著名的 BWT，最初定义在字符串上）到图的扩展/改编。这些图构成了语言的重要数据结构，因为它们为自动机的转换功能提供了非常有效的存储机制，同时为各种子字符串查询提供了快速支持。这是可能的，因为属性 - 所谓的 \emph{path coherence} - 在惠勒图上有效并且包含在“传播”到（集合）字符串的节点上的排序。通过将惠勒图视为自动机，字符串的排序对应于进入每个状态的单词的词典顺序。这自然会导致考虑惠勒自动机接受的常规语言类别，即惠勒语言。已经表明，与一般情况相反，幂集构造的经典确定是惠勒语言的多项式。因此，大多数经典问题在惠勒语言上变得“容易”——也就是说，可以在多项式时间内解决。此外，决定一个 DFA 是否是惠勒和决定一个 DFA 是否接受惠勒语言是多项式。我们在这里的贡献是为简单的问题设定了上限。例如，每当我们通过切换到一般 NFA 或不固定底层字母表的顺序进行概括时，

更新日期：2021-06-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文