论文标题
细分字幕:改善低资源语音到文本翻译管道
Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines
论文作者
论文摘要
在这项工作中,我们专注于在低资源语言语音到文本翻译的背景下改善ASR输出分段。 ASR输出分割至关重要,因为ASR系统使用纯声信息进行输入音频,并且不能保证输出句子样段。由于大多数MT系统期望句子是输入,因此在较长的未分段段落中喂食可能会导致次优性能。我们探索使用电视节目和电影中字幕数据集来培训更好的ASR分割模型的可行性。我们进一步将言论的部分(POS)标签和依赖项标签信息(从未分段的ASR输出得出)中纳入我们的分割模型中。我们表明,这种嘈杂的句法信息可以提高模型的准确性。我们本质地评估了模型的细分质量,并在下游MT性能以及下游任务上进行了外在评估,包括跨语性信息检索(CLIR)任务和人类相关性评估。我们的模型显示了立陶宛和保加利亚人的下游任务的提高。
In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation. ASR output segmentation is crucial, as ASR systems segment the input audio using purely acoustic information and are not guaranteed to output sentence-like segments. Since most MT systems expect sentences as input, feeding in longer unsegmented passages can lead to sub-optimal performance. We explore the feasibility of using datasets of subtitles from TV shows and movies to train better ASR segmentation models. We further incorporate part-of-speech (POS) tag and dependency label information (derived from the unsegmented ASR outputs) into our segmentation model. We show that this noisy syntactic information can improve model accuracy. We evaluate our models intrinsically on segmentation quality and extrinsically on downstream MT performance, as well as downstream tasks including cross-lingual information retrieval (CLIR) tasks and human relevance assessments. Our model shows improved performance on downstream tasks for Lithuanian and Bulgarian.