当前位置: X-MOL 学术Acta Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A deterministic parsing algorithm for ambiguous regular expressions
Acta Informatica ( IF 0.4 ) Pub Date : 2020-02-04 , DOI: 10.1007/s00236-020-00366-7
Angelo Borsotti , Luca Breveglieri , Stefano Crespi Reghizzi , Angelo Morzenti

We introduce a new parser generator, called Berry–Sethi Parser (BSP), for ambiguous regular expressions (RE). The generator constructs a deterministic finite-state transducer that recognizes an input string, as the classical Berry–Sethi algorithm does, and additionally outputs a linear representation of all the syntax trees of the string; for infinitely ambiguous strings, a policy for selecting representative sets of trees is chosen. To construct the transducer, the RE symbols, including letters, parentheses and other metasymbols, are distinctly numbered, so that the corresponding language becomes locally testable. In this way a deterministic position automaton can be constructed, which recognizes and translates the input into a compact DAG representation of the syntax trees. The correctness of the construction is proved. The transducer operates in a linear time on the input. Its descriptive complexity is analyzed as a function of established RE parameters: the alphabetic width, the number of null string symbols and the height of the RE tree. A condition for checking RE ambiguity on the transducer graph is stated. Experimental results of running the parser generator and the parser on a large RE collection are presented. The POSIX RE disambiguation criterion has also been applied to the parser.

中文翻译:

歧义正则表达式的确定性解析算法

我们引入了一个新的解析器生成器,称为 Berry-Sethi Parser (BSP),用于模糊正则表达式 (RE)。生成器构造了一个确定性的有限状态转换器,它可以识别输入字符串,就像经典的 Berry-Sethi 算法一样,并且另外输出字符串的所有语法树的线性表示;对于无限模糊的字符串,选择具有代表性的树集的策略。为了构建转换器,RE 符号(包括字母、括号和其他元符号)被明确编号,以便相应的语言变得可本地测试。通过这种方式,可以构建确定性位置自动机,它识别输入并将其转换为语法树的紧凑 DAG 表示。证明了构造的正确性。传感器在输入上以线性时间运行。其描述复杂性被分析为已建立的 RE 参数的函数:字母宽度、空字符串符号的数量和 RE 树的高度。陈述了用于检查换能器图上的 RE 歧义的条件。展示了在大型 RE 集合上运行解析器生成器和解析器的实验结果。POSIX RE 消歧标准也已应用于解析器。
更新日期:2020-02-04
down
wechat
bug