完全非自动入学的神经机器翻译：交易技巧

论文标题

完全非自动入学的神经机器翻译：交易技巧

Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade

论文作者

Gu, Jiatao, Kong, Xiang

论文摘要

提出完全非自动进取的神经机器翻译（NAT）以同时预测神经网络的单个前方的令牌，这与变压器基线相比，以质量下降的速度大大降低了推理潜伏期。在这项工作中，我们针对缩小绩效差距，同时保持延迟优势。我们首先检查了完全NAT模型的基本问题，并采用降低产出令牌的学习空间作为基本指导。然后，我们在四个不同方面进行了重新访问方法，这些方法已被证明可有效地改善NAT模型，并仔细地将这些技术与必要的修改相结合。我们对三个翻译基准测试的广泛实验表明，所提出的系统为完全NAT模型实现了新的最新结果，并与自回归和迭代的NAT系统获得了可比的性能。例如，提议的模型之一在WMT14 EN-DE上达到27.49个BLEU点，在推理时速度约为16.5倍。

Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks, which significantly reduces the inference latency at the expense of quality drop compared to the Transformer baseline. In this work, we target on closing the performance gap while maintaining the latency advantage. We first inspect the fundamental issues of fully NAT models, and adopt dependency reduction in the learning space of output tokens as the basic guidance. Then, we revisit methods in four different aspects that have been proven effective for improving NAT models, and carefully combine these techniques with necessary modifications. Our extensive experiments on three translation benchmarks show that the proposed system achieves the new state-of-the-art results for fully NAT models, and obtains comparable performance with the autoregressive and iterative NAT systems. For instance, one of the proposed models achieves 27.49 BLEU points on WMT14 En-De with approximately 16.5X speed up at inference time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题