了解基于角色的神经机器翻译：将芬兰语翻译成英语的情况

论文标题

了解基于角色的神经机器翻译：将芬兰语翻译成英语的情况

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

论文作者

Tang, Gongbo, Sennrich, Rico, Nivre, Joakim

论文摘要

最近的工作表明，更深的基于角色的神经机器翻译（NMT）模型可以胜过基于子字的模型。但是，目前尚不清楚是什么使基于角色的更深层次的模型成功。 In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information.实验结果表明，单个头部的单词级别的注意力导致1.2个BLEU点下降。

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题