部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

论文作者

Fu, Yonggan, Zhang, Yang, Qian, Kaizhi, Ye, Zhifan, Yu, Zhongzhi, Lai, Cheng-I, Lin, Yingyan Celine

论文摘要

丰富的语音表征的自学学习（SSL）在低资源的自动语音识别（ASR）和其他语音处理任务方面取得了经验成功，这可以减轻大量转录语音的必要性，从而促使人们对越来越多的言语和其他语音处理的需求不断增长。但是，高级语音SSL模型变得越来越大，这与有限的设备资源相矛盾。在需要同时识别多种语言或执行多个语音处理任务的多语言/多任务方案中，此差距可能更为严重。此外，在低资源语音语料库中被填补时，强烈的过度参数化语音SSL模型往往会遭受过度拟合。这项工作旨在增强语音SSL模型的实际用法，以提高效率，并通过我们提出的S $^3 $^3 $ - ROUTER框架来减轻过度使用，这首先发现，仅通过填充SSL的填充模型来实现更精确的言论，这首先只能丢弃10 \％的模型权重，可以实现更精确的言论。更重要的是，s $^3 $ - 鲁特可以用作一种启用（1）新的填充方案，（2）有效的多语言/多任务解决方案，（3）一种最先进的ASR修剪技术，以及（4）一种新工具以定量分析学习的语音表示。我们认为S $^3 $ - ROUTER为实际部署语音SSL模型提供了新的观点。我们的代码可在以下网址提供：https：//github.com/gatech-eic/s3-router。

Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly large, which contradicts the limited on-device resources. This gap could be more severe in multilingual/multitask scenarios requiring simultaneously recognizing multiple languages or executing multiple speech processing tasks. Additionally, strongly overparameterized speech SSL models tend to suffer from overfitting when being finetuned on low-resource speech corpus. This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S$^3$-Router framework, which for the first time discovers that simply discarding no more than 10\% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks. More importantly, S$^3$-Router can serve as an all-in-one technique to enable (1) a new finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a state-of-the-art ASR pruning technique, and (4) a new tool to quantitatively analyze the learned speech representation. We believe S$^3$-Router has provided a new perspective for practical deployment of speech SSL models. Our codes are available at: https://github.com/GATECH-EIC/S3-Router.

下载PDF全文

下载文献需遵守相关版权规定

论文标题