通过联合学习令牌提取和文本生成来增强不完整的话语恢复

论文标题

通过联合学习令牌提取和文本生成来增强不完整的话语恢复

Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation

论文作者

Inoue, Shumpei, Liu, Tsungwei, Son, Nguyen Hong, Nguyen, Minh-Tien

论文摘要

本文介绍了一个称为JET（\ textbf {j} oint Learning token \ textbf {e} xtraction和\ textbf {t} ext endenation的模型（\ textbf {j} oint Learning Token \ textbf {e textbf {t} ext Generation）。与仅在提取或抽象数据集上工作的先前研究不同，我们设计了一个简单但有效的模型，用于两种IUR方案。我们的设计模拟了IUR的性质，在上下文中省略了令牌有助于恢复。由此，我们构建了一个识别省略令牌的选择器。为了支持选择器，我们设计了两种标签创建方法（软标签和硬标签），这些方法可以在没有注释数据的情况下可用。恢复是通过在关节学习的帮助者的帮助下使用发电机来完成的。在提取和抽象场景中的四个基准数据集上的有希望的结果表明，我们的模型比富裕和有限的培训数据设置中的预读的T5和非生成语言模型方法。

This paper introduces a model for incomplete utterance restoration (IUR) called JET (\textbf{J}oint learning token \textbf{E}xtraction and \textbf{T}ext generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.\footnote{The code is available at \url{https://github.com/shumpei19/JET}}

下载PDF全文

下载文献需遵守相关版权规定

论文标题