通过听，参加和咒语模型在线自动语音识别

论文标题

通过听，参加和咒语模型在线自动语音识别

Online Automatic Speech Recognition with Listen, Attend and Spell Model

论文作者

Hsiao, Roger, Can, Dogan, Ng, Tim, Travadi, Ruchir, Ghoshal, Arnab

论文摘要

在完全在线模式下操作时，聆听，参加和咒语模型和其他基于注意力的自动语音识别（ASR）模型具有已知限制。在本文中，我们分析了LAS模型的在线操作，以证明这些局限性源于沉默区域的处理以及在输入缓冲区边缘的在线注意机制的可靠性。我们提出了一种新颖而简单的技术，可以在满足准确性和延迟目标的同时获得完全的在线认可。对于普通话的任务，我们提出的方法可以在在线操作中达到与离线LAS模型在4％以内的字符错误率。提出的在线LAS模型相对于常规神经网络隐藏的Markov模型混合物的延迟较低12％。我们已经通过生产量表部署验证了所提出的方法，据我们所知，这是完全在线LAS模型的第一个部署。

The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题