当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Information-theoretic User Interaction: Significant Inputs for Program Synthesis
arXiv - CS - Programming Languages Pub Date : 2020-06-22 , DOI: arxiv-2006.12638
Ashish Tiwari, Arjun Radhakrishna, Sumit Gulwani, and Daniel Perelman

Programming-by-example technologies are being deployed in industrial products for real-time synthesis of various kinds of data transformations. These technologies rely on the user to provide few representative examples of the transformation task. Motivated by the need to find the most pertinent question to ask the user, in this paper, we introduce the {\em significant questions problem}, and show that it is hard in general. We then develop an information-theoretic greedy approach for solving the problem. We justify the greedy algorithm using the conditional entropy result, which informally says that the question that achieves the maximum information gain is the one that we know least about. In the context of interactive program synthesis, we use the above result to develop an {\em{active program learner}} that generates the significant inputs to pose as queries to the user in each iteration. The procedure requires extending a {\em{passive program learner}} to a {\em{sampling program learner}} that is able to sample candidate programs from the set of all consistent programs to enable estimation of information gain. It also uses clustering of inputs based on features in the inputs and the corresponding outputs to sample a small set of candidate significant inputs. Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset of %around 800 string transformation tasks.

中文翻译:

信息论用户交互:程序综合的重要输入

实例编程技术正在工业产品中部署,用于实时合成各种数据转换。这些技术依赖于用户提供的转换任务的几个代表性示例。由于需要找到最相关的问题来询问用户,在本文中,我们介绍了 {\em 重要问题问题},并表明它通常很难。然后,我们开发了一种信息论贪婪方法来解决问题。我们使用条件熵结果证明贪婪算法的合理性,它非正式地表示,实现最大信息增益的问题是我们最不了解的问题。在交互式程序合成的背景下,我们使用上面的结果来开发一个 {\em{active program learner}},它生成重要的输入,在每次迭代中向用户提出查询。该过程需要将 {\em{passive program learner}} 扩展为 {\em{sampling program learner}},该 {\em{sampling program learner}} 能够从所有一致的程序集中采样候选程序,以实现信息增益的估计。它还使用基于输入和相应输出中的特征的输入聚类来对一小组候选重要输入进行采样。我们的主动学习器能够权衡误报和误报,并在大约 800 个字符串转换任务的真实世界数据集上进行少量迭代收敛。该过程需要将 {\em{passive program learner}} 扩展为 {\em{sampling program learner}},该 {\em{sampling program learner}} 能够从所有一致的程序集中采样候选程序,以实现信息增益的估计。它还使用基于输入和相应输出中的特征的输入聚类来对一小组候选重要输入进行采样。我们的主动学习器能够权衡误报和误报,并在大约 800 个字符串转换任务的真实世界数据集上进行少量迭代收敛。该过程需要将 {\em{passive program learner}} 扩展为 {\em{sampling program learner}},该 {\em{sampling program learner}} 能够从所有一致程序的集合中采样候选程序,以实现信息增益的估计。它还使用基于输入和相应输出中的特征的输入聚类来对一小组候选重要输入进行采样。我们的主动学习器能够权衡误报和误报,并在大约 800 个字符串转换任务的真实世界数据集上进行少量迭代收敛。
更新日期:2020-06-24
down
wechat
bug