3Massiv：社交媒体的多语言，多模式和多模式数据集简短视频

论文标题

3Massiv：社交媒体的多语言，多模式和多模式数据集简短视频

3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

论文作者

Gupta, Vikram, Mittal, Trisha, Mathur, Puneet, Mishra, Vaibhav, Maheshwari, Mayank, Bera, Aniket, Mukherjee, Debdoot, Manocha, Dinesh

论文摘要

我们介绍了3Massiv，这是一种多语言，多模式和多种多样的，熟练的，熟练的数据集，这些数据集是从Short-Video社交媒体平台-Moj中提取的各种简短视频。 3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages and captures popular short video trends like pranks, fails, romance, comedy expressed via unique audio-visual formats like self-shot videos, reaction videos, lip-synching, self-sung songs, etc. 3MASSIV presents an opportunity for multimodal and multilingual semantic understanding on these unique videos通过注释概念，情感状态，媒体类型和音频语言。我们对3MASSIV进行了详尽的分析，并与其他具有强大基线的当代流行数据集相比，强调了数据集的多样性和独特方面。我们还展示了3Massiv中的社交媒体内容在本质上是动态的和时间的，可以用于语义理解任务和跨语性分析。

We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj. 3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages and captures popular short video trends like pranks, fails, romance, comedy expressed via unique audio-visual formats like self-shot videos, reaction videos, lip-synching, self-sung songs, etc. 3MASSIV presents an opportunity for multimodal and multilingual semantic understanding on these unique videos by annotating them for concepts, affective states, media types, and audio language. We present a thorough analysis of 3MASSIV and highlight the variety and unique aspects of our dataset compared to other contemporary popular datasets with strong baselines. We also show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题