论文标题
翻译类似的语言:相互可理解性在多语言变压器中的作用
Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers
论文作者
论文摘要
作为我们对WMT 2020类似语言翻译共享任务的贡献的一部分,我们研究了不同的方法在低资源条件下翻译的不同方法。我们在两个方向上为所有语言对提交了基于变压器的双语和多语言系统。我们还利用一种语言对的反向翻译,获得了超过3个BLEU点的改进。我们根据每对之间的相互可理解性(基于JACCARD相似性)的程度来解释我们的结果,从而在相互的可理解性和模型性能之间找到正相关。我们的西班牙-Catalan模型在所有五对语言对中都具有最佳性能。除了印地语 - 马拉蒂的情况外,我们的双语模型比所有成对的多语言模型都取得了更好的性能。
We investigate different approaches to translate between similar languages under low resource conditions, as part of our contribution to the WMT 2020 Similar Languages Translation Shared Task. We submitted Transformer-based bilingual and multilingual systems for all language pairs, in the two directions. We also leverage back-translation for one of the language pairs, acquiring an improvement of more than 3 BLEU points. We interpret our results in light of the degree of mutual intelligibility (based on Jaccard similarity) between each pair, finding a positive correlation between mutual intelligibility and model performance. Our Spanish-Catalan model has the best performance of all the five language pairs. Except for the case of Hindi-Marathi, our bilingual models achieve better performance than the multilingual models on all pairs.