论文标题
端到端语音识别的Neural-fst Class语言模型
Neural-FST Class Language Model for End-to-End Speech Recognition
论文作者
论文摘要
我们提出了用于端到端语音识别的神经 - 五级语言模型(NFCLM),这是一种新颖的方法,将神经网络语言模型(NNLMS)和有限状态换能器(FSTS)结合在数学上一致的框架中。我们的方法利用了一个背景NNLM,该背景将通用背景文本与以单个FST为单位建模的域特异性实体集合一起建模。每个输出令牌都是由这些组件的混合物产生的。用单独训练的神经决定者估计混合物的重量。我们表明,就单词错误率而言,NFCLM显着优于NNLM相对15.8%。 NFCLM的性能与传统的NNLM和FST浅融合相似,同时不容易过度过度,而紧凑的12倍,使其更适合于设备的使用情况。
We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework. Our method utilizes a background NNLM which models generic background text together with a collection of domain-specific entities modeled as individual FSTs. Each output token is generated by a mixture of these components; the mixture weights are estimated with a separately trained neural decider. We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate. NFCLM achieves similar performance as traditional NNLM and FST shallow fusion while being less prone to overbiasing and 12 times more compact, making it more suitable for on-device usage.