基于CTC的ASR的上下文感知知识转移策略

论文标题

基于CTC的ASR的上下文感知知识转移策略

A context-aware knowledge transferring strategy for CTC-based ASR

论文作者

Lu, Ke-Han, Chen, Kuan-Yu

论文摘要

非自动性自动语音识别（ASR）建模最近由于其快速解码速度和出色的性能而受到了越来越多的关注。在代表中，基于Connectionist时间分类（CTC）的方法仍然是主要的流。但是，理论上固有的缺陷，即代币之间独立性的假设，为工程学校造成了绩效障碍。为了减轻挑战，我们为基于CTC的ASR提供了一种背景感知的知识转移策略，该策略由知识传输模块和背景感知培训策略组成。前者旨在将语言信息从预先训练的语言模型中提炼出来，后者构架以调节条件独立性假设引起的局限性。结果，本文介绍了基于WAV2VEC2.0的基于知识的上下文感知的CTC ASR。 Aishell-1和Aishell-2数据集的一系列实验证明了该方法的有效性。

Non-autoregressive automatic speech recognition (ASR) modeling has received increasing attention recently because of its fast decoding speed and superior performance. Among representatives, methods based on the connectionist temporal classification (CTC) are still a dominating stream. However, the theoretically inherent flaw, the assumption of independence between tokens, creates a performance barrier for the school of works. To mitigate the challenge, we propose a context-aware knowledge transferring strategy, consisting of a knowledge transferring module and a context-aware training strategy, for CTC-based ASR. The former is designed to distill linguistic information from a pre-trained language model, and the latter is framed to modulate the limitations caused by the conditional independence assumption. As a result, a knowledge-injected context-aware CTC-based ASR built upon the wav2vec2.0 is presented in this paper. A series of experiments on the AISHELL-1 and AISHELL-2 datasets demonstrate the effectiveness of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题