Programming by Rewards,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Programming by Rewards
arXiv - CS - Programming Languages Pub Date : 2020-07-14 , DOI: arxiv-2007.06835
Nagarajan Natarajan, Ajaykrishna Karthikeyan, Prateek Jain, Ivan Radicek, Sriram Rajamani, Sumit Gulwani, Johannes Gehrke

We formalize and study ``programming by rewards'' (PBR), a new approach for specifying and synthesizing subroutines for optimizing some quantitative metric such as performance, resource utilization, or correctness over a benchmark. A PBR specification consists of (1) input features $x$, and (2) a reward function $r$, modeled as a black-box component (which we can only run), that assigns a reward for each execution. The goal of the synthesizer is to synthesize a "decision function" $f$ which transforms the features to a decision value for the black-box component so as to maximize the expected reward $E[r \circ f (x)]$ for executing decisions $f(x)$ for various values of $x$. We consider a space of decision functions in a DSL of loop-free if-then-else programs, which can branch on linear functions of the input features in a tree-structure and compute a linear function of the inputs in the leaves of the tree. We find that this DSL captures decision functions that are manually written in practice by programmers. Our technical contribution is the use of continuous-optimization techniques to perform synthesis of such decision functions as if-then-else programs. We also show that the framework is theoretically-founded ---in cases when the rewards satisfy nice properties, the synthesized code is optimal in a precise sense. We have leveraged PBR to synthesize non-trivial decision functions related to search and ranking heuristics in the PROSE codebase (an industrial strength program synthesis framework) and achieve competitive results to manually written procedures over multiple man years of tuning. We present empirical evaluation against other baseline techniques over real-world case studies (including PROSE) as well on simple synthetic benchmarks.

中文翻译：

奖励编程

我们形式化和研究“奖励编程”（PBR），这是一种指定和合成子程序的新方法，用于优化某些量化指标，例如性能、资源利用率或基准正确性。PBR 规范由（1）输入特征 $x$ 和（2）奖励函数 $r$ 组成，建模为黑盒组件（我们只能运行），为每次执行分配奖励。合成器的目标是合成一个“决策函数”$f$，它将特征转换为黑盒组件的决策值，从而最大化预期奖励 $E[r \circ f (x)]$对 $x$ 的各种值执行决策 $f(x)$。我们在无循环 if-then-else 程序的 DSL 中考虑决策函数空间，它可以对树结构中输入特征的线性函数进行分支，并计算树叶中输入的线性函数。我们发现这个 DSL 捕获了程序员在实践中手动编写的决策函数。我们的技术贡献是使用持续优化技术来执行诸如 if-then-else 程序之类的决策函数的综合。我们还表明，该框架是有理论依据的——在奖励满足良好属性的情况下，合成代码在精确意义上是最佳的。我们利用 PBR 来综合与 PROSE 代码库（一种工业强度程序综合框架）中的搜索和排序启发式相关的非平凡决策函数，并在经过多年的调整后实现了手动编写程序的有竞争力的结果。

更新日期：2020-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文