论文标题
阿拉伯语到英语广播新闻的端到端语音翻译
End-to-End Speech Translation of Arabic to English Broadcast News
论文作者
论文摘要
语音翻译(ST)是将源语言直接转化为外语文本的声音信号的任务。很长一段时间以来,使用两个模块的管道方法解决了ST任务:首先是源语言的自动语音识别(ASR),然后是文本到文本机器翻译(MT)。在过去的几年中,我们看到了使用序列到序列深神经网络模型向端到端方法的范式转变。本文介绍了我们为开发第一个广播新闻端到端阿拉伯语到英语语音翻译系统的努力。从独立的ASR和MT LDC版本开始,我们能够确定大约92个小时的阿拉伯音频录音,并在该细分市场级别将手动转录转录为英语。这些数据用于在多种情况下进行训练和比较管道和端到端语音翻译系统,包括转移学习和数据增强技术。
Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift towards the end-to-end approaches using sequence-to-sequence deep neural network models. This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. Starting from independent ASR and MT LDC releases, we were able to identify about 92 hours of Arabic audio recordings for which the manual transcription was also translated into English at the segment level. These data was used to train and compare pipeline and end-to-end speech translation systems under multiple scenarios including transfer learning and data augmentation techniques.