论文标题

词汇语义如何影响翻译?一项实证研究

How do lexical semantics affect translation? An empirical study

论文作者

Subramanian, Vivek, Sundararaman, Dhanasekar

论文摘要

神经机器翻译(NMT)系统旨在将文本从一种语言映射到另一种语言。尽管NMT有多种应用,但最重要的是自然语言的翻译。自然语言的一个区别因素是,通常根据给定语言的语法规则订购单词。尽管在开发用于翻译自然语言的NMT系统方面已经取得了许多进步,但对于了解源和目标语言之间的单词排序和词汇相似性如何影响翻译表现的研究很少。在这里,我们通过opensubtitles2016数据库的各种低资源语言对研究了这些关系,其中源语言是英语,发现目标语言越含英语,翻译性能就越大。此外,我们研究了在英语序列中提供一部分单词语音(POS)的NMT模型的影响,并发现对于基于变压器的模型,目标语言与英语相似,而POS提供的好处就越大。

Neural machine translation (NMT) systems aim to map text from one language into another. While there are a wide variety of applications of NMT, one of the most important is translation of natural language. A distinguishing factor of natural language is that words are typically ordered according to the rules of the grammar of a given language. Although many advances have been made in developing NMT systems for translating natural language, little research has been done on understanding how the word ordering of and lexical similarity between the source and target language affect translation performance. Here, we investigate these relationships on a variety of low-resource language pairs from the OpenSubtitles2016 database, where the source language is English, and find that the more similar the target language is to English, the greater the translation performance. In addition, we study the impact of providing NMT models with part of speech of words (POS) in the English sequence and find that, for Transformer-based models, the more dissimilar the target language is from English, the greater the benefit provided by POS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源