情感：学习真实世界的视觉数据的解释

论文标题

情感：学习真实世界的视觉数据的解释

Affection: Learning Affective Explanations for Real-World Visual Data

论文作者

Achlioptas, Panos, Ovsjanikov, Maks, Guibas, Leonidas, Tulyakov, Sergey

论文摘要

在这项工作中，我们探讨了现实世界图像倾向于通过使用自然语言作为对给定视觉刺激的情感反应背后的理由来诱导的情感反应。为了踏上这一旅程，我们与研究社区一起介绍并分享了一个大规模数据集，其中包含情感反应和自由形式的文本解释，对85,007个公开可用的图像进行了分析，由6,283个注释者分析，他们被要求指出并解释他们在观察特定图像时如何以及如何以526,749的特定图像来表明和解释他们如何以及如何以特殊的方式进行感受，并响应526,749,749.749.749.749.749.749.749.749。即使情绪反应对背景（个人情绪，社会地位，过去的经验）是主观的和敏感的，但我们表明，在受试者人群中得到很大的支持，有重要的共同点可以捕获潜在的合理情感反应。鉴于这种关键的观察，我们提出以下问题：i）我们可以开发多模式神经网络，以对现实世界的视觉数据提供合理的情感响应，用语言解释？ ii）我们可以引导此类方法以不同程度的务实语言产生解释，或者在适应基础视觉刺激的同时证明不同的情绪反应？最后，iii）我们如何评估这项新任务的这种方法的性能？通过这项工作，我们采取了第一个步骤来解决所有这些问题，从而为更丰富，以人为中心和情感意识的图像分析系统铺平了道路。我们介绍的数据集和所有已开发的方法都可以在https://affective-explanations.org上找到。

In this work, we explore the emotional reactions that real-world images tend to induce by using natural language as the medium to express the rationale behind an affective response to a given visual stimulus. To embark on this journey, we introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85,007 publicly available images, analyzed by 6,283 annotators who were asked to indicate and explain how and why they felt in a particular way when observing a specific image, producing a total of 526,749 responses. Even though emotional reactions are subjective and sensitive to context (personal mood, social status, past experiences) - we show that there is significant common ground to capture potentially plausible emotional responses with a large support in the subject population. In light of this crucial observation, we ask the following questions: i) Can we develop multi-modal neural networks that provide reasonable affective responses to real-world visual data, explained with language? ii) Can we steer such methods towards producing explanations with varying degrees of pragmatic language or justifying different emotional reactions while adapting to the underlying visual stimulus? Finally, iii) How can we evaluate the performance of such methods for this novel task? With this work, we take the first steps in addressing all of these questions, thus paving the way for richer, more human-centric, and emotionally-aware image analysis systems. Our introduced dataset and all developed methods are available on https://affective-explanations.org

下载PDF全文

下载文献需遵守相关版权规定

论文标题