当前位置: X-MOL 学术J. Comput. Lang. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Morbig: A Static parser for POSIX shell
Journal of Computer Languages ( IF 1.7 ) Pub Date : 2020-02-18 , DOI: 10.1016/j.cola.2020.100944
Yann Régis-Gianas , Nicolas Jeannerod , Ralf Treinen

The POSIX shell language defies conventional wisdom of compiler construction on several levels: The shell language was not designed for static parsing, but with an intertwining of syntactic analysis and execution by expansion in mind. Token recognition cannot be specified by regular expressions and lexical analysis depends on the parsing context and the evaluation context. Besides, the unorthodox design choices of the shell language fit badly in the usual specification languages used to describe other programming languages. This makes the standard usage of Lex and Yacc as a pipeline inadequate for the implementation of a parser for POSIX shell. The existing implementations of shell parsers are complex and use low-level character-level parsing code that is difficult to relate to the POSIX specification. We find it hard to trust such parsers, especially when using them for writing automatic verification tools for shell scripts.

This paper offers an overview of the technical difficulties related to the syntactic analysis of the POSIX shell language. It also describes how we have resolved these difficulties using advanced parsing techniques (namely speculative parsing, parser state introspection, context-dependent lexical analysis and longest-prefix parsing) while keeping the implementation at a sufficiently high level of abstraction so that experts can check that the POSIX standard is respected. The resulting tool, called Morbig, is an open-source static parser for a well-defined and realistic subset of the POSIX shell language. Its implementation crucially relies on the purity and incrementality of LR(1) parsers generated by Menhir, a parser generator for OCaml.



中文翻译:

Morbig:POSIX Shell的静态解析器

POSIX外壳语言在几个级别上违反了编译器构造的传统常识:外壳语言不是为静态解析而设计的,而是通过扩展考虑了语法分析和执行的交织。令牌识别无法通过正则表达式指定,并且词法分析取决于解析上下文和评估上下文。此外,shell语言的非常规设计选择与用于描述其他编程语言的常用规范语言非常不匹配。这使得LexYacc成为标准用法管道不足以实现POSIX Shell解析器。Shell解析器的现有实现很复杂,并且使用了难以与POSIX规范相关联的低级字符级解析代码。我们发现很难相信这样的解析器,尤其是在使用它们为外壳脚本编写自动验证工具时。

本文概述了与POSIX Shell语言的语法分析相关的技术难题。它还描述了我们如何使用高级解析技术(即推测性解析,解析器状态自省,上下文依赖的词法分析和最长前缀解析)解决了这些难题,同时将实现保持在足够高的抽象水平上,以便专家可以检查遵守POSIX标准。产生的工具称为Morbig,是一个开源静态解析器,用于POSIX Shell语言的定义明确且切合实际的子集。它的实现至关重要地依赖于由Menhir(用于OCaml的解析器生成器)生成的LR(1)解析器的纯度和增量。

更新日期:2020-02-18
down
wechat
bug