Multi-modal Synthesis of Regular Expressions,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-modal Synthesis of Regular Expressions
arXiv - CS - Programming Languages Pub Date : 2019-08-09 , DOI: arxiv-1908.03316
Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, Isil Dillig

In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language alone is often highly ambiguous, whereas examples in isolation are often not sufficient for conveying user intent. Our proposed technique first parses the English description into a so-called hierarchical sketch that guides our programming-by-example (PBE) engine. Since the hierarchical sketch captures crucial hints, the PBE engine can leverage this information to both prioritize the search as well as make useful deductions for pruning the search space. We have implemented the proposed technique in a tool called Regel and evaluate it on over three hundred regexes. Our evaluation shows that Regel achieves 80% accuracy whereas the NLP-only and PBE-only baselines achieve 43% and 26% respectively. We also compare our proposed PBE engine against an adaptation of AlphaRegex, a state-of-the-art regex synthesis tool, and show that our proposed PBE engine is an order of magnitude faster, even if we adapt the search algorithm of AlphaRegex to leverage the sketch. Finally, we conduct a user study involving 20 participants and show that users are twice as likely to successfully come up with the desired regex using Regel compared to without it.

中文翻译：

正则表达式的多模态合成

在本文中，我们提出了一种多模态合成技术，用于从示例和自然语言的组合中自动构建正则表达式（regex）。在这种情况下使用多种模态很有用，因为单独的自然语言通常非常模糊，而孤立的示例通常不足以传达用户意图。我们提出的技术首先将英文描述解析为所谓的分层草图，以指导我们的示例编程 (PBE) 引擎。由于分层草图捕获了关键提示，PBE 引擎可以利用这些信息来确定搜索的优先级，并为修剪搜索空间做出有用的推论。我们已经在名为 Regel 的工具中实现了所提出的技术，并在超过三百个正则表达式上对其进行了评估。我们的评估表明，Regel 达到了 80% 的准确率，而仅 NLP 和仅 PBE 的基线分别达到了 43% 和 26%。我们还将我们提出的 PBE 引擎与最先进的正则表达式合成工具 AlphaRegex 的改编进行了比较，并表明我们提出的 PBE 引擎要快一个数量级，即使我们改编 AlphaRegex 的搜索算法以利用草图。最后，我们进行了一项涉及 20 名参与者的用户研究，结果表明，与不使用 Regel 相比，用户使用 Regel 成功提出所需正则表达式的可能性是其两倍。并表明我们提出的 PBE 引擎要快一个数量级，即使我们调整 AlphaRegex 的搜索算法以利用草图。最后，我们进行了一项涉及 20 名参与者的用户研究，结果表明，与不使用 Regel 相比，用户使用 Regel 成功提出所需正则表达式的可能性是其两倍。并表明我们提出的 PBE 引擎要快一个数量级，即使我们调整 AlphaRegex 的搜索算法以利用草图。最后，我们进行了一项涉及 20 名参与者的用户研究，结果表明，与不使用 Regel 相比，用户使用 Regel 成功提出所需正则表达式的可能性是其两倍。

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文