Don't Panic! Better, Fewer, Syntax Errors for LR Parsers,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Don't Panic! Better, Fewer, Syntax Errors for LR Parsers
arXiv - CS - Programming Languages Pub Date : 2018-04-19 , DOI: arxiv-1804.07133
Lukas Diekmann and Laurence Tratt

Syntax errors are generally easy to fix for humans, but not for parsers in general nor LR parsers in particular. Traditional 'panic mode' error recovery, though easy to implement and applicable to any grammar, often leads to a cascading chain of errors that drown out the original. More advanced error recovery techniques suffer less from this problem but have seen little practical use because their typical performance was seen as poor, their worst case unbounded, and the repairs they reported arbitrary. In this paper we introduce the CPCT+ algorithm, and an implementation of that algorithm, that address these issues. First, CPCT+ reports the complete set of minimum cost repair sequences for a given location, allowing programmers to select the one that best fits their intention. Second, on a corpus of 200,000 real-world syntactically invalid Java programs, CPCT+ is able to repair 98.37% of files within a timeout of 0.5s. Finally, CPCT+ uses the complete set of minimum cost repair sequences to reduce the cascading error problem, where incorrect error recovery causes further spurious syntax errors to be identified. Across the test corpus, CPCT+ reports 435,812 error locations to the user, reducing the cascading error problem substantially relative to the 981,628 error locations reported by panic mode.

中文翻译：

不要惊慌！LR 解析器的更好、更少的语法错误

语法错误对于人类来说通常很容易修复，但对于一般的解析器和 LR 解析器来说都不是。传统的“恐慌模式”错误恢复虽然易于实现并且适用于任何语法，但通常会导致一连串的错误淹没原始错误。更先进的错误恢复技术受到这个问题的影响较小，但几乎没有实际应用，因为它们的典型性能被认为很差，它们的最坏情况是无限的，并且它们报告的修复是任意的。在本文中，我们介绍了 CPCT+ 算法以及该算法的实现，以解决这些问题。首先，CPCT+ 报告给定位置的全套最低成本修复序列，允许程序员选择最符合其意图的修复序列。其次，在 200 个语料库中，000 个现实世界中语法无效的 Java 程序，CPCT+ 能够在 0.5 秒的超时内修复 98.37% 的文件。最后，CPCT+ 使用完整的最低成本修复序列集来减少级联错误问题，错误恢复会导致进一步的虚假语法错误被识别。在整个测试语料库中，CPCT+ 向用户报告了 435,812 个错误位置，相对于恐慌模式报告的 981,628 个错误位置，大大减少了级联错误问题。

更新日期：2020-07-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>