论文标题
使用概念激活向量的多模式情感识别的解释性
Interpretability for Multimodal Emotion Recognition using Concept Activation Vectors
论文作者
论文摘要
多模式情绪识别是指基于多种输入方式(通常是视频,音频和文本)将输入视频序列分类为情感标签。近年来,深层神经网络在识别人类情绪方面表现出了出色的表现,并且在这项任务上与人类水平的表现相提并论。尽管该领域最近取得了进步,但由于其推理和决策过程的晦涩的性质,现实世界的设置尚未接受情感识别系统。该领域的大多数研究都涉及新的体系结构,以提高此任务的绩效,并尝试为这些模型的决定提供解释。在本文中,我们使用概念激活向量(CAVS)在情感识别的背景下解决了神经网络的可解释性问题。为了分析模型的潜在空间,我们定义了特定于情感AI的人类理解概念,并将其映射到广泛使用的Iemocap多模式数据库中。然后,我们评估我们提出的概念在双向上下文LSTM(BC-LSTM)网络的多层上的影响,以表明可以使用人类可行的概念来代表神经网络的推理过程。最后,我们对我们提出的概念进行假设检验,以表明它们对于该任务的解释性意义重大。
Multimodal Emotion Recognition refers to the classification of input video sequences into emotion labels based on multiple input modalities (usually video, audio and text). In recent years, Deep Neural networks have shown remarkable performance in recognizing human emotions, and are on par with human-level performance on this task. Despite the recent advancements in this field, emotion recognition systems are yet to be accepted for real world setups due to the obscure nature of their reasoning and decision-making process. Most of the research in this field deals with novel architectures to improve the performance for this task, with a few attempts at providing explanations for these models' decisions. In this paper, we address the issue of interpretability for neural networks in the context of emotion recognition using Concept Activation Vectors (CAVs). To analyse the model's latent space, we define human-understandable concepts specific to Emotion AI and map them to the widely-used IEMOCAP multimodal database. We then evaluate the influence of our proposed concepts at multiple layers of the Bi-directional Contextual LSTM (BC-LSTM) network to show that the reasoning process of neural networks for emotion recognition can be represented using human-understandable concepts. Finally, we perform hypothesis testing on our proposed concepts to show that they are significant for interpretability of this task.