论文标题
通过情感和语言分类扩展基于RNN-T的语音识别系统
Extending RNN-T-based speech recognition systems with emotion and language classification
论文作者
论文摘要
语音转录,情绪识别和语言识别通常被认为是三个不同的任务。每个人都需要一个不同的模型,具有不同的体系结构和培训过程。我们建议使用基于反复的神经网络传感器(RNN-T)的语音到文本(STT)系统作为一种常见组件,可用于情绪识别和语言识别以及语音识别。我们的工作通过最小的变化扩展了STT系统,以进行情绪分类,并在IEMOCAP和MELD数据集上显示了成功的结果。此外,我们证明,通过在RNN-T模块中添加轻质组件,也可以用于语言识别。在我们的评估中,这个新的分类器展示了NIST-LRE-07数据集的最新准确性。
Speech transcription, emotion recognition, and language identification are usually considered to be three different tasks. Each one requires a different model with a different architecture and training process. We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification as well as for speech recognition. Our work extends the STT system for emotion classification through minimal changes, and shows successful results on the IEMOCAP and MELD datasets. In addition, we demonstrate that by adding a lightweight component to the RNN-T module, it can also be used for language identification. In our evaluations, this new classifier demonstrates state-of-the-art accuracy for the NIST-LRE-07 dataset.