从信息理论的角度来提高多模式神经机器翻译中的视觉意识

论文标题

从信息理论的角度来提高多模式神经机器翻译中的视觉意识

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

论文作者

Ji, Baijun, Zhang, Tong, Zou, Yicheng, Hu, Bojie, Shen, Si

论文摘要

多模式机器翻译（MMT）旨在通过为源句子配备相应的图像来提高翻译质量。尽管表现出色，但MMT模型仍然遭受输入降低的问题：模型更多地关注文本信息，而视觉信息通常被忽略。在本文中，我们通过从信息理论的角度提高视觉意识来努力提高MMT性能。详细说明，我们将信息丰富的视觉信号分解为两个部分：特定于源的信息和特定于目标的信息。我们使用共同信息来量化它们，并提出两种客观优化的方法，以更好地利用视觉信号。两个数据集上的实验表明，我们的方法可以有效地提高MMT模型的视觉意识，并针对强基础实现出色的结果。

Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题