论文标题

珊瑚:培训对话框生成模型的上下文响应可检索性损失函数

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

论文作者

Santra, Bishal, Ghadia, Ravi, Gupta, Manish, Goyal, Pawan

论文摘要

在自然语言处理领域,有许多任务可以使用跨凝结术(CE)损失函数有效地解决。但是,对话生成的任务给CE损失带来了独特的挑战。这是因为CE损失假定对于任何给定输入,唯一可能的输出是训练数据集中的地面真相。但是,在对话框的生成中,可以有多个有效的响应(对于给定的上下文),这些响应不仅具有不同的表面形式,而且在语义上也可能有所不同。此外,对话框生成任务的CE损失计算不会考虑输入上下文,因此,对响应的评分如何,与上下文无关。为了对相关性,参与性等质量等质量进行评分,损失函数应取决于上下文和生成的响应。为了解决这些局限性,本文提出了珊瑚,这是一种基于强化学习(RL)对话框生成任务的新型损失功能,并具有奖励功能,该功能估计了人类对生成的响应的偏好,同时考虑上下文和响应。此外,为了克服诸如RL训练的高样本复杂性和大型动作空间之类的挑战,我们提出了一种混合式训练算法。值得注意的是,使用珊瑚可以训练对话框的生成模型,而无需假设地面真相是唯一正确的响应。基准数据集的广泛比较表明,基于珊瑚的模型优于不同大小的强大最新基线模型。

In the field of Natural Language Processing, there are many tasks that can be tackled effectively using the cross-entropy (CE) loss function. However, the task of dialog generation poses unique challenges for CE loss. This is because CE loss assumes that, for any given input, the only possible output is the one available as the ground truth in the training dataset. But, in dialog generation, there can be multiple valid responses (for a given context) that not only have different surface forms but can also be semantically different. Furthermore, CE loss computation for the dialog generation task does not take the input context into consideration and, hence, it grades the response irrespective of the context. To grade the generated response for qualities like relevance, engagingness, etc., the loss function should depend on both the context and the generated response. To address these limitations, this paper proposes CORAL, a novel loss function based on a reinforcement learning (RL) view of the dialog generation task with a reward function that estimates human preference for generated responses while considering both the context and the response. Furthermore, to overcome challenges such as high sample complexity of RL training and a large action space, we propose a mix-policy training algorithm. Notably, using CORAL we can train dialog generation models without assuming the ground-truth as the only correct response. Extensive comparisons on benchmark datasets demonstrate that CORAL based models outperform strong state-of-the-art baseline models of different sizes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源