论文标题
Xtreme-S:评估跨语性语音表示
XTREME-S: Evaluating Cross-lingual Speech Representations
论文作者
论文摘要
我们介绍了Xtreme-S,这是一种新的基准,用于评估多种语言的通用跨语性语音表示。 Xtreme-S涵盖了四个任务家庭:语音识别,分类,语音到文本翻译和检索。 Xtreme-S涵盖10种以上语言家庭,3个不同领域和4个任务系列的102种语言,旨在简化多语言语音表示评估,并催化“通用”语音表示学习中的研究。本文描述了新的基准,并在所有下游任务上使用XLS-R和MSLAM建立了第一个仅语音和语音文本基线。我们激励设计选择和详细说明如何使用基准测试。在https://hf.co/datasets/google/xtreme_s上可以轻松访问数据集和微调脚本。
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.