非自动回忆神经机器翻译的任务级课程学习

论文标题

非自动回忆神经机器翻译的任务级课程学习

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

论文作者

Liu, Jinglin, Ren, Yi, Tan, Xu, Zhang, Chen, Qin, Tao, Zhao, Zhou, Liu, Tie-Yan

论文摘要

非自动回旋翻译（NAT）的推理速度更快，但与自回旋翻译相比（AT）的成本较差。由于AT和NAT可以共享模型结构，并且由于对先前目标侧代币的明确依赖性，AT比NAT更容易，因此自然的想法是将模型培训从更轻松的任务转移到更艰难的NAT任务。为了平滑从训练到NAT培训的转变，在本文中，我们将半自动回调翻译（SAT）作为中间任务引入。 SAT包含一个高参数K，每个K值都以不同程度的并行性定义了SAT任务。特别是，SAT覆盖在其特殊情况下，在nat上覆盖：它降低到k = 1的何时和k = n时的nat（n是目标句子的长度）。我们设计课程时间表以逐步将K从1逐渐转移到N，同时具有不同的起搏功能和任务数量。我们称我们的方法为NAT（TCL-NAT）的任务级课程学习。 IWSLT14 DE-EN，IWSLT16 EN-DE，WMT14 EN-DE和DE-EN数据集的实验表明，TCL-NAT可在先前的NAT基准线上取得显着的准确性提高，并降低NAT和AT之间的性能差距，并在1-2个BLEU点到1-2个BLEU点，表明我们拟议的方法的有效性。

Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT). Since AT and NAT can share model structure and AT is an easier task than NAT due to the explicit dependency on previous target-side tokens, a natural idea is to gradually shift the model training from the easier AT task to the harder NAT task. To smooth the shift from AT training to NAT training, in this paper, we introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT contains a hyperparameter k, and each k value defines a SAT task with different degrees of parallelism. Specially, SAT covers AT and NAT as its special cases: it reduces to AT when k = 1 and to NAT when k = N (N is the length of target sentence). We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time. We called our method as task-level curriculum learning for NAT (TCL-NAT). Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baselines and reduces the performance gap between NAT and AT models to 1-2 BLEU points, demonstrating the effectiveness of our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题