Rescorebert：与伯特（Bert）歧视性言语识别

论文标题

Rescorebert：与伯特（Bert）歧视性言语识别

RescoreBERT: Discriminative Speech Recognition Rescoring with BERT

论文作者

Xu, Liyan, Gu, Yile, Kolehmainen, Jari, Khan, Haidar, Gandhe, Ankur, Rastrow, Ariya, Stolcke, Andreas, Bulyko, Ivan

论文摘要

第二频繁的逆转是自动语音识别（ASR）系统的重要组成部分，用于通过实施晶格逆转或$ n $ n $ best重新排列来改善第一频繁解码器的输出。虽然以蒙版语言模型（MLM）的目标进行了预处理，但在各种自然语言理解（NLU）任务方面取得了巨大成功，但它并没有作为ASR的重新录取模型而获得关注。具体而言，尚未探索诸如最小WER（MWER）之类的判别目标的培训双向模型。在这里，我们展示了如何训练基于MWER损失的基于BERT的重新纠正模型，以将歧视性损失的改进纳入ASR的深双向预定模型的微调。具体而言，我们提出了一种融合策略，该策略将MLM纳入判别训练过程中，以有效地从验证的模型中提炼知识。我们进一步提出了另一种歧视性损失。我们称这种方法称为Rescorebert，将WER在Bert基线上的LibrisPeech清洁/其他测试集上的相对相对减少了6.6％/3.4％，而没有歧视目标。我们还从对话剂的内部数据集上评估了我们的方法，并发现它在LSTM撤销模型上降低了潜伏期和WER（相对3至8％）。

Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or $n$-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题