论文标题
神经机器翻译的通用有条件蒙版的语言预训练
Universal Conditional Masked Language Pre-training for Neural Machine Translation
论文作者
论文摘要
预训练的序列到序列模型具有显着改善的神经机译(NMT)。与先前训练的模型通常采用单向解码器不同的工作不同,本文证明了训练序列到序列模型,但具有双向解码器可以为自动回应和非自动进取的NMT带来显着的性能增长。具体而言,我们提出了Cemat,这是一种有条件的蒙版语言模型,以多种语言进行了大规模双语和单语语料库。我们还介绍了两种简单但有效的方法,以增强CEMAT,对齐代码切换和掩盖以及动态双掩蔽。我们进行了广泛的实验,并表明我们的CEMAT可以为从低资源到极高的语言(即低资源上的+14.4 BLEU)实现显着的性能提高,并且在低资源方面+7.9 BLEU改进自动回收NMT。对于非运动NMT,我们证明它也可以产生一致的性能增长,即高达+5.3 bleu。据我们所知,这是预先培训两个NMT任务的统一模型的第一项工作。代码,数据和预训练的模型可在https://github.com/huawei-noah/pretaining-language-model/tree/master/master/cemat中找到。
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT). Different from prior works where pre-trained models usually adopt an unidirectional decoder, this paper demonstrates that pre-training a sequence-to-sequence model but with a bidirectional decoder can produce notable performance gains for both Autoregressive and Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages. We also introduce two simple but effective methods to enhance the CeMAT, aligned code-switching & masking and dynamic dual-masking. We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios from low- to extremely high-resource languages, i.e., up to +14.4 BLEU on low resource and +7.9 BLEU improvements on average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it can also produce consistent performance gains, i.e., up to +5.3 BLEU. To the best of our knowledge, this is the first work to pre-train a unified model for fine-tuning on both NMT tasks. Code, data, and pre-trained models are available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/CeMAT.