X向量符合情感：对情感和说话者认可之间依赖关系的研究

论文标题

X向量符合情感：对情感和说话者认可之间依赖关系的研究

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

论文作者

Pappagari, Raghavendra, Wang, Tianzi, Villalba, Jesus, Chen, Nanxin, Dehak, Najim

论文摘要

在这项工作中，我们探讨了说话者认可和情感认识之间的依赖性。我们首先表明，可以通过转移学习来重复使用有关说话者识别的知识以识别情感。然后，我们显示情感对说话者认可的影响。为了进行情感识别，我们表明，使用简单的线性模型足以在从X-Vector模型等预训练模型（例如X-Vector模型）中提取的功能上获得良好的性能。然后，我们通过微调对情绪分类来提高情绪识别表现。我们在三种不同类型的数据集上评估了实验：IEMOCAP，MSP播客和Crema-D。通过微调，我们在没有预训练的基线模型上分别获得了IEMOCAP，MSP播音和CREMA-D的30.40％，7.99％和8.61％的绝对改进。最后，我们介绍了情感对说话者验证的影响。我们观察到说话者验证的性能容易发生测试扬声器情绪的变化。我们发现，在所有三个数据集中，以愤怒的话语的审判都表现出色。我们希望我们的分析能够在演讲者认可社区启动新的研究系列。

In this work, we explore the dependencies between speaker recognition and emotion recognition. We first show that knowledge learned for speaker recognition can be reused for emotion recognition through transfer learning. Then, we show the effect of emotion on speaker recognition. For emotion recognition, we show that using a simple linear model is enough to obtain good performance on the features extracted from pre-trained models such as the x-vector model. Then, we improve emotion recognition performance by fine-tuning for emotion classification. We evaluated our experiments on three different types of datasets: IEMOCAP, MSP-Podcast, and Crema-D. By fine-tuning, we obtained 30.40%, 7.99%, and 8.61% absolute improvement on IEMOCAP, MSP-Podcast, and Crema-D respectively over baseline model with no pre-training. Finally, we present results on the effect of emotion on speaker verification. We observed that speaker verification performance is prone to changes in test speaker emotions. We found that trials with angry utterances performed worst in all three datasets. We hope our analysis will initiate a new line of research in the speaker recognition community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题