RNN换能器的普通话识别的发音感知的独特字符编码

论文标题

RNN换能器的普通话识别的发音感知的独特字符编码

Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

论文作者

Shen, Peng, Lu, Xugang, Kawai, Hisashi

论文摘要

对于普通话端到端（E2E）自动语音识别（ASR）任务，与基于角色的建模单元相比，基于发音的建模单元可以改善模型培训中的建模单元的共享，但遇到了同音词性问题。在这项研究中，我们建议使用一种新颖的发音感知的独特字符编码来构建基于E2E RNN-T的普通话ASR系统。所提出的编码是发音基音节和字符索引（CI）的组合。通过引入CI，RNN-T模型可以在利用发音信息来提取建模单元的同时克服同音问题。通过提出的编码，可以通过一对一的映射将模型输出转换为最终识别结果。我们在Aishell和MagicData数据集上进行了实验，实验结果表明了该方法的有效性。

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems. In this study, we propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems. The proposed encoding is a combination of pronunciation-base syllable and character index (CI). By introducing the CI, the RNN-T model can overcome the homophone problem while utilizing the pronunciation information for extracting modeling units. With the proposed encoding, the model outputs can be converted into the final recognition result through a one-to-one mapping. We conducted experiments on Aishell and MagicData datasets, and the experimental results showed the effectiveness of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题