Xtreme-S：评估跨语性语音表示

论文标题

Xtreme-S：评估跨语性语音表示

XTREME-S: Evaluating Cross-lingual Speech Representations

论文作者

Conneau, Alexis, Bapna, Ankur, Zhang, Yu, Ma, Min, von Platen, Patrick, Lozhkov, Anton, Cherry, Colin, Jia, Ye, Rivera, Clara, Kale, Mihir, Van Esch, Daan, Axelrod, Vera, Khanuja, Simran, Clark, Jonathan H., Firat, Orhan, Auli, Michael, Ruder, Sebastian, Riesa, Jason, Johnson, Melvin

论文摘要

我们介绍了Xtreme-S，这是一种新的基准，用于评估多种语言的通用跨语性语音表示。 Xtreme-S涵盖了四个任务家庭：语音识别，分类，语音到文本翻译和检索。 Xtreme-S涵盖10种以上语言家庭，3个不同领域和4个任务系列的102种语言，旨在简化多语言语音表示评估，并催化“通用”语音表示学习中的研究。本文描述了新的基准，并在所有下游任务上使用XLS-R和MSLAM建立了第一个仅语音和语音文本基线。我们激励设计选择和详细说明如何使用基准测试。在https://hf.co/datasets/google/xtreme_s上可以轻松访问数据集和微调脚本。

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.

下载PDF全文

下载文献需遵守相关版权规定

论文标题