论文标题

在野外部署自我监督的学习,以进行混合自动语音识别

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

论文作者

Karimi, Mostafa, Liu, Changliang, Kumatani, Kenichi, Qian, Yao, Wu, Tianyu, Wu, Jian

论文摘要

事实证明,自我监督学习(SSL)方法在自动语音识别(ASR)方面非常成功。这些巨大的改进主要是基于高度策划的数据集(例如,用于非流到端到端ASR模型的LibrisPeech)。但是,SSL的关键特性应用于任何未转录的音频数据。在本文中,我们提供了有关如何利用SSL中未经皮肤的音频数据从数据预处理到部署流式混合ASR模型的完整探索。 More specifically, we present (1) the effect of Audio Event Detection (AED) model in data pre-processing pipeline (2) analysis on choosing optimizer and learning rate scheduling (3) comparison of recently developed contrastive losses, (4) comparison of various pre-training strategies such as utilization of in-domain versus out-domain pre-training data, monolingual versus multilingual pre-training data, multi-head multilingual SSL与单头多语言SSL和受监督的预训练与SSL相比。实验结果表明,与所有替代性外域预训练策略相比,使用域内未经构域数据的SSL预训练可以取得更好的性能。

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming End-to-End ASR models. However, the pivotal characteristics of SSL is to be utilized for any untranscribed audio data. In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model. More specifically, we present (1) the effect of Audio Event Detection (AED) model in data pre-processing pipeline (2) analysis on choosing optimizer and learning rate scheduling (3) comparison of recently developed contrastive losses, (4) comparison of various pre-training strategies such as utilization of in-domain versus out-domain pre-training data, monolingual versus multilingual pre-training data, multi-head multilingual SSL versus single-head multilingual SSL and supervised pre-training versus SSL. The experimental results show that SSL pre-training with in-domain uncurated data can achieve better performance in comparison to all the alternative out-domain pre-training strategies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源