通过学习自我纠正来生成序列

论文标题

通过学习自我纠正来生成序列

Generating Sequences by Learning to Self-Correct

论文作者

Welleck, Sean, Lu, Ximing, West, Peter, Brahman, Faeze, Shen, Tianxiao, Khashabi, Daniel, Choi, Yejin

论文摘要

序列生成应用需要满足语义约束，例如确保程序正确，使用某些关键字或避免不良内容。语言模型，无论是微调还是在很少进行的演示中进行，经常违反这些限制，并且缺乏迭代修改其输出的机制。此外，一些强大的语言模型具有极高的规模或无法访问，使其效率低下（即使不是不可行），以更新其参数以进行特定于任务的适应性。我们提出了自我纠正，一种方法是从单独的校正器中解除了不完美的基本发生器（现成的语言模型或监督序列到序列模型），该校正器学会到迭代正确的几代人。为了训练校正器，我们提出了一个在线培训程序，该程序可以使用标量或自然语言反馈对中间的不完美一代。我们表明，在三个不同的生成任务中的基础发生器（数学程序综合，词汇约束的生成和毒性控制）中，自我纠正会改善基础发生器 - 即使纠正器比基础发生器小得多。

Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content. Language models, whether fine-tuned or prompted with few-shot demonstrations, frequently violate these constraints, and lack a mechanism to iteratively revise their outputs. Moreover, some powerful language models are of extreme scale or inaccessible, making it inefficient, if not infeasible, to update their parameters for task-specific adaptation. We present Self-Correction, an approach that decouples an imperfect base generator (an off-the-shelf language model or supervised sequence-to-sequence model) from a separate corrector that learns to iteratively correct imperfect generations. To train the corrector, we propose an online training procedure that can use either scalar or natural language feedback on intermediate imperfect generations. We show that Self-Correction improves upon the base generator in three diverse generation tasks - mathematical program synthesis, lexically-constrained generation, and toxicity control - even when the corrector is much smaller than the base generator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题