神经机器翻译的通用有条件蒙版的语言预训练

论文标题

神经机器翻译的通用有条件蒙版的语言预训练

Universal Conditional Masked Language Pre-training for Neural Machine Translation

论文作者

Li, Pengfei, Li, Liangyou, Zhang, Meng, Wu, Minghao, Liu, Qun

论文摘要

预训练的序列到序列模型具有显着改善的神经机译（NMT）。与先前训练的模型通常采用单向解码器不同的工作不同，本文证明了训练序列到序列模型，但具有双向解码器可以为自动回应和非自动进取的NMT带来显着的性能增长。具体而言，我们提出了Cemat，这是一种有条件的蒙版语言模型，以多种语言进行了大规模双语和单语语料库。我们还介绍了两种简单但有效的方法，以增强CEMAT，对齐代码切换和掩盖以及动态双掩蔽。我们进行了广泛的实验，并表明我们的CEMAT可以为从低资源到极高的语言（即低资源上的+14.4 BLEU）实现显着的性能提高，并且在低资源方面+7.9 BLEU改进自动回收NMT。对于非运动NMT，我们证明它也可以产生一致的性能增长，即高达+5.3 bleu。据我们所知，这是预先培训两个NMT任务的统一模型的第一项工作。代码，数据和预训练的模型可在https://github.com/huawei-noah/pretaining-language-model/tree/master/master/cemat中找到。

Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a sequence-to-sequence model but with a bidirectional decoder can produce notable performance gains for both Autoregressive and Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages. We also introduce two simple but effective methods to enhance the CeMAT, aligned code-switching & masking and dynamic dual-masking. We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios from low- to extremely high-resource languages, i.e., up to +14.4 BLEU on low resource and +7.9 BLEU improvements on average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it can also produce consistent performance gains, i.e., up to +5.3 BLEU. To the best of our knowledge, this is the first work to pre-train a unified model for fine-tuning on both NMT tasks. Code, data, and pre-trained models are available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/CeMAT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题