论文标题
具有条件生成模型的目标定向的离散结构的生成
Goal-directed Generation of Discrete Structures with Conditional Generative Models
论文作者
论文摘要
尽管有最近的进步,但目标定向的结构化离散数据仍然具有挑战性。对于诸如程序合成(生成源代码)和材料设计(生成分子)之类的问题,很难找到满足所需约束或展示所需属性的示例。实际上,经常采用昂贵的启发式搜索或增强学习算法。在本文中,我们通过对具有感兴趣属性的离散结构的分布进行建模,调查了有条件生成模型的使用,该模型的使用直接攻击了此反问题。不幸的是,此类模型的最大似然训练通常会因生成模型的样本而失败,而不足以尊重输入属性。为了解决这个问题,我们介绍了一种新颖的方法,可以直接优化强化学习目标,从而最大程度地提高预期奖励。我们避免使用高变化的得分函数估计器,否则从近似到归一化的奖励进行了采样,从而使模型梯度的简单蒙特卡洛估计值。我们在两个任务上测试我们的方法:生成具有用户定义属性的分子,并识别评估给定目标值的简短表达式。在这两种情况下,我们都会发现比最大似然估计和其他基线的改进。
Despite recent advances, goal-directed generation of structured discrete data remains challenging. For problems such as program synthesis (generating source code) and materials design (generating molecules), finding examples which satisfy desired constraints or exhibit desired properties is difficult. In practice, expensive heuristic search or reinforcement learning algorithms are often employed. In this paper we investigate the use of conditional generative models which directly attack this inverse problem, by modeling the distribution of discrete structures given properties of interest. Unfortunately, maximum likelihood training of such models often fails with the samples from the generative model inadequately respecting the input properties. To address this, we introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We avoid high-variance score-function estimators that would otherwise be required by sampling from an approximation to the normalized rewards, allowing simple Monte Carlo estimation of model gradients. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value. In both cases, we find improvements over maximum likelihood estimation and other baselines.