EEV：一个用于研究视频中唤起表达式的大型数据集

论文标题

EEV：一个用于研究视频中唤起表达式的大型数据集

EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video

论文作者

Sun, Jennifer J., Liu, Ting, Cowen, Alan S., Schroff, Florian, Adam, Hartwig, Prasad, Gautam

论文摘要

视频可以唤起观众中的一系列情感响应。在观看视频观看视频之前，可以从视频中预测诱发影响的能力可以帮助创建内容和视频推荐。我们介绍了视频（EEV）数据集中的引人注目的表达式，这是一个大规模数据集，用于研究观众对视频的响应。每个视频的注释在6 Hz的注释中，带有15个连续的诱发表达标签，对应于对视频做出反应的观众的面部表达。我们在数据收集框架中使用表达识别模型来实现可扩展性。总共有3670万个观众面部反应注释，以23,574个视频（1,700小时）。我们使用公开可用的视频语料库来获取各种视频内容。我们使用现有的多模式复发模型在EEV数据集上建立基线性能。转移学习实验表明，在EEV预先培训时，LIRISCASCEDE视频数据集的性能有所改善。我们希望EEV数据集的规模和多样性将鼓励在视频理解和情感计算中进一步探索。 EEV的子集在https://github.com/google-research-datasets/eev上发布。

Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video. We use an expression recognition model within our data collection framework to achieve scalability. In total, there are 36.7 million annotations of viewer facial reactions to 23,574 videos (1,700 hours). We use a publicly available video corpus to obtain a diverse set of video content. We establish baseline performance on the EEV dataset using an existing multimodal recurrent model. Transfer learning experiments show an improvement in performance on the LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing. A subset of EEV is released at https://github.com/google-research-datasets/eev.

下载PDF全文

下载文献需遵守相关版权规定

论文标题