论文标题
Rescorebert:与伯特(Bert)歧视性言语识别
RescoreBERT: Discriminative Speech Recognition Rescoring with BERT
论文作者
论文摘要
第二频繁的逆转是自动语音识别(ASR)系统的重要组成部分,用于通过实施晶格逆转或$ n $ n $ best重新排列来改善第一频繁解码器的输出。虽然以蒙版语言模型(MLM)的目标进行了预处理,但在各种自然语言理解(NLU)任务方面取得了巨大成功,但它并没有作为ASR的重新录取模型而获得关注。具体而言,尚未探索诸如最小WER(MWER)之类的判别目标的培训双向模型。在这里,我们展示了如何训练基于MWER损失的基于BERT的重新纠正模型,以将歧视性损失的改进纳入ASR的深双向预定模型的微调。具体而言,我们提出了一种融合策略,该策略将MLM纳入判别训练过程中,以有效地从验证的模型中提炼知识。我们进一步提出了另一种歧视性损失。我们称这种方法称为Rescorebert,将WER在Bert基线上的LibrisPeech清洁/其他测试集上的相对相对减少了6.6%/3.4%,而没有歧视目标。我们还从对话剂的内部数据集上评估了我们的方法,并发现它在LSTM撤销模型上降低了潜伏期和WER(相对3至8%)。
Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or $n$-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.