论文标题
自动语音识别的自我批判序列培训
Self-critical Sequence Training for Automatic Speech Recognition
论文作者
论文摘要
尽管自动语音识别(ASR)任务通过序列到序列模型取得了显着的成功,但其训练和测试之间存在两个主要的不匹配,可能导致性能退化:1)通常使用的跨透明标准旨在最大程度地提高训练数据的训练数据的对数流类别,而绩效则是通过Word错误率(WER错误率),而不是log log Hood(Weri weeri),而不是log log log Hoodyhone(Weri),而不是logiki logiki horiki horiki horiki ligikii likikii likikii likikii likikii likikii likikii y likikii; 2)教师训练的方法导致培训期间对地面真理的依赖,这意味着该模型在测试之前从未暴露于自己的预测。在本文中,我们提出了一种称为自批判序列训练(SCST)的优化方法,以使训练程序更接近测试阶段。作为基于强化的学习(RL)方法,SCST利用自定义的奖励功能将培训标准和WER关联。此外,它消除了对教师的依赖,并在其推理程序方面统一了模型。我们对干净和嘈杂的语音数据集进行了实验,结果表明,在WER方面,提出的SCST分别在基线上取得了8.7%和7.8%的相对改善。
Although automatic speech recognition (ASR) task has gained remarkable success by sequence-to-sequence models, there are two main mismatches between its training and testing that might lead to performance degradation: 1) The typically used cross-entropy criterion aims to maximize log-likelihood of the training data, while the performance is evaluated by word error rate (WER), not log-likelihood; 2) The teacher-forcing method leads to the dependence on ground truth during training, which means that model has never been exposed to its own prediction before testing. In this paper, we propose an optimization method called self-critical sequence training (SCST) to make the training procedure much closer to the testing phase. As a reinforcement learning (RL) based method, SCST utilizes a customized reward function to associate the training criterion and WER. Furthermore, it removes the reliance on teacher-forcing and harmonizes the model with respect to its inference procedure. We conducted experiments on both clean and noisy speech datasets, and the results show that the proposed SCST respectively achieves 8.7% and 7.8% relative improvements over the baseline in terms of WER.