当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lake symbols for island parsing
arXiv - CS - Programming Languages Pub Date : 2020-10-30 , DOI: arxiv-2010.16306
Katsumi Okuda, Shigeru Chiba

Context: An island parser reads an input text and builds the parse (or abstract syntax) tree of only the programming constructs of interest in the text. These constructs are called islands and the rest of the text is called water, which the parser ignores and skips over. Since an island parser does not have to parse all the details of the input, it is often easy to develop but still useful enough for a number of software engineering tools. When a parser generator is used, the developer can implement an island parser by just describing a small number of grammar rules, for example, in Parsing Expression Grammar (PEG). Inquiry: In practice, however, the grammar rules are often complicated since the developer must define the water inside the island; otherwise, the island parsing will not reduce the total number of grammar rules. When describing the grammar rules for such water, the developer must consider other rules and enumerate a set of symbols, which we call alternative symbols. Due to this difficulty, island parsing seems to be not widely used today despite its usefulness in many applications. Approach: This paper proposes the lake symbols for addressing this difficulty in developing an island parser. It also presents an extension to PEG for supporting the lake symbols. The lake symbols automate the enumeration of the alternative symbols for the water inside an island. The paper proposes an algorithm for translating the extended PEG to the normal PEG, which can be given to an existing parser generator based on PEG. Knowledge: The user can use lake symbols to define water without specifying each alternative symbol. Our algorithms can calculate all alternative symbols for a lake symbol, based on where the lake symbol is used in the grammar. Grounding: We implemented a parser generator accepting our extended PEG and implemented 36 island parsers for Java and 20 island parsers for Python. Our experiments show that the lake symbols reduce 42 % of grammar rules for Java and 89 % of rules for Python on average, excluding the case where islands are expressions. Importance: This work eases the use of island parsing. Lake symbols enable the user to define the water inside the island simpler than before. Defining water inside the island is essential to apply island parsing for practical programming languages.

中文翻译:

用于岛屿解析的湖符号

上下文:孤岛解析器读取输入文本并仅构建文本中感兴趣的编程结构的解析(或抽象语法)树。这些结构称为岛,文本的其余部分称为水,解析器将忽略并跳过它们。由于孤岛解析器不必解析输入的所有细节,因此它通常易于开发,但对于许多软件工程工具仍然足够有用。当使用解析器生成器时,开发人员只需描述少量语法规则即可实现孤岛解析器,例如在解析表达式语法 (PEG) 中。查询:然而,在实践中,由于开发人员必须定义岛内的水,因此语法规则通常很复杂;否则,孤岛解析不会减少语法规则的总数。在描述这种水的语法规则时,开发者必须考虑其他规则,并列举出一组符号,我们称之为替代符号。由于这个困难,尽管孤岛解析在许多应用中很有用,但它今天似乎并未得到广泛使用。方法:本文提出了湖泊符号来解决开发岛屿解析器的这一困难。它还提供了对 PEG 的扩展以支持湖泊符号。湖泊符号自动枚举岛屿内水的替代符号。论文提出了一种将扩展PEG 翻译成普通PEG 的算法,该算法可以提供给现有的基于PEG 的解析器生成器。知识:用户可以使用湖泊符号来定义水,而无需指定每个替代符号。我们的算法可以根据湖符号在语法中的使用位置计算湖符号的所有替代符号。基础:我们实现了一个解析器生成器,接受我们的扩展 PEG,并为 Java 实现了 36 个岛解析器,为 Python 实现了 20 个岛解析器。我们的实验表明,湖符号平均减少了 42% 的 Java 语法规则和 89% 的 Python 规则,不包括岛是表达式的情况。重要性:这项工作简化了岛解析的使用。湖泊符号使用户能够比以前更简单地定义岛内的水。定义岛内的水对于将岛解析应用于实际编程语言至关重要。我们实现了一个解析器生成器,接受我们的扩展 PEG,并为 Java 实现了 36 个岛解析器,为 Python 实现了 20 个岛解析器。我们的实验表明,湖符号平均减少了 42% 的 Java 语法规则和 89% 的 Python 规则,不包括岛是表达式的情况。重要性:这项工作简化了岛解析的使用。湖泊符号使用户能够比以前更简单地定义岛内的水。定义岛内的水对于将岛解析应用于实际编程语言至关重要。我们实现了一个解析器生成器,接受我们的扩展 PEG,并为 Java 实现了 36 个岛解析器,为 Python 实现了 20 个岛解析器。我们的实验表明,湖符号平均减少了 42% 的 Java 语法规则和 89% 的 Python 规则,不包括岛是表达式的情况。重要性:这项工作简化了岛解析的使用。湖泊符号使用户能够比以前更简单地定义岛内的水。定义岛内的水对于将岛解析应用于实际编程语言至关重要。湖泊符号使用户能够比以前更简单地定义岛内的水。定义岛内的水对于将岛解析应用于实际编程语言至关重要。湖泊符号使用户能够比以前更简单地定义岛内的水。定义岛内的水对于将岛解析应用于实际编程语言至关重要。
更新日期:2020-11-02
down
wechat
bug