论文标题

视频SEMNET:内存启动视频语义网络

Video SemNet: Memory-Augmented Video Semantic Network

论文作者

Vijayaraghavan, Prashanth, Roy, Deb

论文摘要

故事是传达思想,经验,社会和文化价值观的一种非常引人注目的媒介。叙事是故事的特定表现,将其变成了观众的知识。在本文中,我们提出了一种机器学习方法,以弥合视觉介质的低级数据表示和语义方面之间的差距,以捕获电影中的叙事元素。我们提出了一个名为“视频SEMNET”的内存启动视频语义网络,以编码语义描述符并学习视频的嵌入。该模型采用两个主要组成部分:(i)一种神经语义学习者,该神经语义学习者学习语义描述符的潜在嵌入,(ii)保留并记住视频中特定语义模式的内存模块。我们评估了从模型的变体获得的两个任务的视频表示:(a)流派预测和(b)IMDB评级预测。我们证明我们的模型能够预测加权F-1分别为0.72和0.63的流派和IMDB评分。结果表明我们的模型的代表性以及这种表示衡量受众参与的能力。

Stories are a very compelling medium to convey ideas, experiences, social and cultural values. Narrative is a specific manifestation of the story that turns it into knowledge for the audience. In this paper, we propose a machine learning approach to capture the narrative elements in movies by bridging the gap between the low-level data representations and semantic aspects of the visual medium. We present a Memory-Augmented Video Semantic Network, called Video SemNet, to encode the semantic descriptors and learn an embedding for the video. The model employs two main components: (i) a neural semantic learner that learns latent embeddings of semantic descriptors and (ii) a memory module that retains and memorizes specific semantic patterns from the video. We evaluate the video representations obtained from variants of our model on two tasks: (a) genre prediction and (b) IMDB Rating prediction. We demonstrate that our model is able to predict genres and IMDB ratings with a weighted F-1 score of 0.72 and 0.63 respectively. The results are indicative of the representational power of our model and the ability of such representations to measure audience engagement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源