通过情感和语言分类扩展基于RNN-T的语音识别系统

论文标题

通过情感和语言分类扩展基于RNN-T的语音识别系统

Extending RNN-T-based speech recognition systems with emotion and language classification

论文作者

Kons, Zvi, Aronowitz, Hagai, Morais, Edmilson, Damasceno, Matheus, Kuo, Hong-Kwang, Thomas, Samuel, Saon, George

论文摘要

语音转录，情绪识别和语言识别通常被认为是三个不同的任务。每个人都需要一个不同的模型，具有不同的体系结构和培训过程。我们建议使用基于反复的神经网络传感器（RNN-T）的语音到文本（STT）系统作为一种常见组件，可用于情绪识别和语言识别以及语音识别。我们的工作通过最小的变化扩展了STT系统，以进行情绪分类，并在IEMOCAP和MELD数据集上显示了成功的结果。此外，我们证明，通过在RNN-T模块中添加轻质组件，也可以用于语言识别。在我们的评估中，这个新的分类器展示了NIST-LRE-07数据集的最新准确性。

Speech transcription, emotion recognition, and language identification are usually considered to be three different tasks. Each one requires a different model with a different architecture and training process. We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification as well as for speech recognition. Our work extends the STT system for emotion classification through minimal changes, and shows successful results on the IEMOCAP and MELD datasets. In addition, we demonstrate that by adding a lightweight component to the RNN-T module, it can also be used for language identification. In our evaluations, this new classifier demonstrates state-of-the-art accuracy for the NIST-LRE-07 dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题