论文标题
带有短语级通用视觉表示的神经机器翻译
Neural Machine Translation with Phrase-Level Universal Visual Representations
论文作者
论文摘要
多模式机器翻译(MMT)旨在通过其他视觉信息来改善神经机器翻译(NMT),但是大多数现有的MMT方法都需要配对源句子和图像的输入,这使得它们遭受了句子图像对的短缺。在本文中,我们为MMT提出了一种基于级别检索的方法,以获取现有句子图像数据集的源输入的视觉信息,以便MMT可以打破配对句子图像输入的限制。我们的方法在短语级别进行检索,因此从对源短语和接地区域中学习视觉信息,这可以减轻数据稀疏性。此外,我们的方法采用条件变分的自动编码器来学习可以过滤冗余视觉信息的视觉表示,并且只保留与短语相关的视觉信息。实验表明,所提出的方法在多个MMT数据集上显着优于强基线,尤其是在文本上下文受到限制时。
Multimodal machine translation (MMT) aims to improve neural machine translation (NMT) with additional visual information, but most existing MMT methods require paired input of source sentence and image, which makes them suffer from shortage of sentence-image pairs. In this paper, we propose a phrase-level retrieval-based method for MMT to get visual information for the source input from existing sentence-image data sets so that MMT can break the limitation of paired sentence-image input. Our method performs retrieval at the phrase level and hence learns visual information from pairs of source phrase and grounded region, which can mitigate data sparsity. Furthermore, our method employs the conditional variational auto-encoder to learn visual representations which can filter redundant visual information and only retain visual information related to the phrase. Experiments show that the proposed method significantly outperforms strong baselines on multiple MMT datasets, especially when the textual context is limited.