多任务学习与非自动回归机器翻译的共享编码器

论文标题

多任务学习与非自动回归机器翻译的共享编码器

Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation

论文作者

Hao, Yongchang, He, Shilin, Jiao, Wenxiang, Tu, Zhaopeng, Lyu, Michael, Wang, Xing

论文摘要

非自动入学机器翻译（NAT）模型表现出明显的推理加速，但遭受了劣质的翻译精度。解决该问题的普遍做法是将自回归的机器翻译（AT）知识转移到NAT模型，例如使用知识蒸馏。在这项工作中，我们假设并经验验证了AT和NAT编码的源句子的不同语言特性。因此，我们建议采用多任务学习，以通过编码器共享将AT AT知识转移到NAT模型。具体而言，我们将AT模型作为一项辅助任务来增强NAT模型性能。 WMT14英语 - 德国人和WMT16英国罗马尼亚数据集的实验结果表明，所提出的多任务NAT比基线NAT模型取得了重大改进。此外，大规模WMT19和WMT20英语 - 德国数据集的性能确认了我们提出的方法的一致性。此外，实验结果表明，我们的多任务NAT与知识蒸馏互补，即NAT的标准知识转移方法。

Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy. The common practice to tackle the problem is transferring the Autoregressive machine Translation (AT) knowledge to NAT models, e.g., with knowledge distillation. In this work, we hypothesize and empirically verify that AT and NAT encoders capture different linguistic properties of source sentences. Therefore, we propose to adopt Multi-Task learning to transfer the AT knowledge to NAT models through encoder sharing. Specifically, we take the AT model as an auxiliary task to enhance NAT model performance. Experimental results on WMT14 English-German and WMT16 English-Romanian datasets show that the proposed Multi-Task NAT achieves significant improvements over the baseline NAT models. Furthermore, the performance on large-scale WMT19 and WMT20 English-German datasets confirm the consistency of our proposed method. In addition, experimental results demonstrate that our Multi-Task NAT is complementary to knowledge distillation, the standard knowledge transfer method for NAT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题