探测语言知识的言语情感识别变压器

论文标题

探测语言知识的言语情感识别变压器

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

论文作者

Triantafyllopoulos, Andreas, Wagner, Johannes, Wierstorf, Hagen, Schmitt, Maximilian, Reichel, Uwe, Eyben, Florian, Burkhardt, Felix, Schuller, Björn W.

论文摘要

由自我发场层（变形金刚）组成的大型，预训练的神经网络最近在几种语音情绪识别（SER）数据集上取得了最新的结果。这些模型通常以自我监督的方式进行预训练，以提高自动语音识别性能，从而了解语言信息。在这项工作中，我们研究了在微调过程中利用此信息的程度。使用基于开源工具的可重现方法，我们在改变文本的情感的同时综合了韵律中性的语音话语。变压器模型的价预测对正面和负面情绪含量以及否定性非常反应，但对增强器或还原器不反应，而这些语言特征都没有影响唤醒或优势。这些发现表明，变形金刚可以成功利用语言信息来改善其价预测，并且应将语言分析包括在其测试中。

Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance -- and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers, while none of those linguistic features impact arousal or dominance. These findings show that transformers can successfully leverage linguistic information to improve their valence predictions, and that linguistic analysis should be included in their testing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题