通过奖励编程

论文标题

通过奖励编程

Programming by Rewards

论文作者

Natarajan, Nagarajan, Karthikeyan, Ajaykrishna, Jain, Prateek, Radicek, Ivan, Rajamani, Sriram, Gulwani, Sumit, Gehrke, Johannes

论文摘要

我们对``通过奖励进行编程''（PBR）进行正规化和研究，这是一种指定和合成子例程的新方法，用于优化一些定量度量，例如性能，资源利用或基准上的正确性。 PBR规范由（1）输入功能$ x $组成，（2）奖励功能$ r $，以黑色框组件（我们只能运行）模型，为每个执行分配奖励。合成器的目标是合成“决策功能” $ F $，该$ F $将功能转换为黑框组件的决策值，以最大程度地提高预期奖励$ e [r \ circ f（x）] $，以执行决策$ f（x）$的各种$ x $。我们考虑在无环的IF-then-else程序中，在DSL中的一个决策函数，该程序可以在树结构中以输入特征的线性函数分支，并计算树叶中输入的线性函数。我们发现该DSL捕获了程序员手动编写的决策功能。我们的技术贡献是使用连续优化技术来执行诸如IF-then-else程序之类的决策功能的综合。我们还表明，在奖励满足良好属性的情况下，该框架是理论上有创始的 - 合成代码在精确的意义上是最佳的。我们已经利用PBR来综合与散文代码库中的搜索和排名有关的非平凡决策功能（工业实力计划综合框架），并在多年的调整年度中获得竞争成果，以手动书面程序。我们对实际案例研究（包括散文）以及简单合成基准的其他基线技术进行了经验评估。

We formalize and study ``programming by rewards'' (PBR), a new approach for specifying and synthesizing subroutines for optimizing some quantitative metric such as performance, resource utilization, or correctness over a benchmark. A PBR specification consists of (1) input features $x$, and (2) a reward function $r$, modeled as a black-box component (which we can only run), that assigns a reward for each execution. The goal of the synthesizer is to synthesize a "decision function" $f$ which transforms the features to a decision value for the black-box component so as to maximize the expected reward $E[r \circ f (x)]$ for executing decisions $f(x)$ for various values of $x$. We consider a space of decision functions in a DSL of loop-free if-then-else programs, which can branch on linear functions of the input features in a tree-structure and compute a linear function of the inputs in the leaves of the tree. We find that this DSL captures decision functions that are manually written in practice by programmers. Our technical contribution is the use of continuous-optimization techniques to perform synthesis of such decision functions as if-then-else programs. We also show that the framework is theoretically-founded ---in cases when the rewards satisfy nice properties, the synthesized code is optimal in a precise sense. We have leveraged PBR to synthesize non-trivial decision functions related to search and ranking heuristics in the PROSE codebase (an industrial strength program synthesis framework) and achieve competitive results to manually written procedures over multiple man years of tuning. We present empirical evaluation against other baseline techniques over real-world case studies (including PROSE) as well on simple synthetic benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题