果酱还是奶油？用烤饼在神经机器翻译中建模歧义

论文标题

果酱还是奶油？用烤饼在神经机器翻译中建模歧义

Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

论文作者

Stahlberg, Felix, Kumar, Shankar

论文摘要

神经机器翻译中的软磁层旨在对互斥代币的分布进行建模。但是，机器翻译本质上是不确定的：相同的源句子可以具有多个语义上等效的翻译。因此，我们建议用多标签分类层替换SoftMax激活，该分类层可以更有效地对歧义进行建模。我们称非排他性序列（SCONE）的损失函数单标签对比度目标。我们表明，使用SCONE损耗函数，可以在单个参考训练数据上对多标签输出层进行训练。烤饼在六个翻译方向上获得一致的BLEU得分提高，尤其是对于中资源的语言对和小型梁尺寸。通过使用较小的光束尺寸，我们可以加快推断3.9倍的速度，并且仍然匹配或改善使用SoftMax获得的BLEU分数。此外，我们证明了烤饼可用于训练NMT模型，该模型为足够的翻译分配了最高概率，从而减轻了“束搜索诅咒”。在不确定性水平的合成语言对上进行的其他实验表明，烤饼的改进可以归因于更好地处理歧义。

The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens. Machine translation, however, is intrinsically uncertain: the same source sentence can have multiple semantically equivalent translations. Therefore, we propose to replace the softmax activation with a multi-label classification layer that can model ambiguity more effectively. We call our loss function Single-label Contrastive Objective for Non-Exclusive Sequences (SCONES). We show that the multi-label output layer can still be trained on single reference training data using the SCONES loss function. SCONES yields consistent BLEU score gains across six translation directions, particularly for medium-resource language pairs and small beam sizes. By using smaller beam sizes we can speed up inference by a factor of 3.9x and still match or improve the BLEU score obtained using softmax. Furthermore, we demonstrate that SCONES can be used to train NMT models that assign the highest probability to adequate translations, thus mitigating the "beam search curse". Additional experiments on synthetic language pairs with varying levels of uncertainty suggest that the improvements from SCONES can be attributed to better handling of ambiguity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题