WACO：语音翻译的单词对比度学习

论文标题

WACO：语音翻译的单词对比度学习

WACO: Word-Aligned Contrastive Learning for Speech Translation

论文作者

Ouyang, Siqi, Ye, Rong, Li, Lei

论文摘要

端到端语音翻译（E2E ST）旨在将源语音直接转化为目标文本。当只有非常小的语音文本数据进行培训时，现有的ST方法的性能很差。我们观察到，ST模型的性能与其语音和源成绩单之间的相似性紧密相关。在本文中，我们提出了单词一致的对比学习（WACO），这是一种简单有效的方法，用于极低的资源语音到文本翻译。我们的关键思想是通过对比度学习来弥合语音和文本方式的单词级表示。我们在IWSLT 2023的Maltese-English上评估了必不可少的C数据集上的WACO和其他方法。我们的实验表明，WACO表明，Waco的表现只能以1小时的平行ST数据优于9+ BLEU点。代码可在https://github.com/owaski/waco上找到。

End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model's performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at https://github.com/owaski/WACO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题