Automatic Repair of Vulnerable Regular Expressions,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Repair of Vulnerable Regular Expressions
arXiv - CS - Programming Languages Pub Date : 2020-10-23 , DOI: arxiv-2010.12450
Nariyoshi Chida and Tachio Terauchi

A regular expression is called vulnerable if there exist input strings on which the usual backtracking-based matching algorithm runs super linear time. Software containing vulnerable regular expressions are prone to algorithmic-complexity denial of service attack in which the malicious user provides input strings exhibiting the bad behavior. Due to the prevalence of regular expressions in modern software, vulnerable regular expressions are serious threat to software security. While there has been prior work on detecting vulnerable regular expressions, in this paper, we present a first step toward repairing a possibly vulnerable regular expression. Importantly, our method handles real world regular expressions containing extended features such as lookarounds, capturing groups, and backreferencing. (The problem is actually trivial without such extensions since any pure regular expression can be made invulnerable via a DFA conversion.) We build our approach on the recent work on example-based repair of regular expressions by Pan et al. [Pan et al. 2019] which synthesizes a regular expression that is syntactically close to the original one and correctly classifies the given set of positive and negative examples. The key new idea is the use of linear-time constraints, which disambiguate a regular expression and ensure linear time matching. We generate the constraints using an extended nondeterministic finite automaton that supports the extended features in real-world regular expressions. While our method is not guaranteed to produce a semantically equivalent regular expressions, we empirically show that the repaired regular expressions tend to be nearly indistinguishable from the original ones.

中文翻译：

自动修复易受攻击的正则表达式

如果存在通常的基于回溯的匹配算法在超线性时间上运行的输入字符串，则正则表达式被称为易受攻击的。包含易受攻击的正则表达式的软件容易受到算法复杂性拒绝服务攻击，其中恶意用户提供表现出不良行为的输入字符串。由于现代软件中正则表达式的盛行，易受攻击的正则表达式对软件安全构成严重威胁。虽然之前已经有检测易受攻击的正则表达式的工作，但在本文中，我们提出了修复可能易受攻击的正则表达式的第一步。重要的是，我们的方法处理现实世界的正则表达式，其中包含扩展功能，例如环视、捕获组和反向引用。（如果没有这样的扩展，问题实际上是微不足道的，因为任何纯正则表达式都可以通过 DFA 转换变得无懈可击。）我们的方法建立在 Pan 等人最近关于基于示例的正则表达式修复工作的基础上。[潘等人。2019] 合成了一个在语法上与原始正则表达式接近的正则表达式，并正确分类了给定的一组正例和负例。关键的新思想是使用线性时间约束，它消除了正则表达式的歧义并确保线性时间匹配。我们使用扩展的非确定性有限自动机生成约束，该自动机支持现实世界正则表达式中的扩展功能。虽然我们的方法不能保证产生语义等效的正则表达式，

更新日期：2020-10-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>