论文标题
“我不能保持它。”来自已已已删除的voat.co新闻聚合器的数据集
"I Can't Keep It Up." A Dataset from the Defunct Voat.co News Aggregator
论文作者
论文摘要
Voat.co是一个新闻汇总者网站,于2020年12月25日关闭。该网站遇到了困难的历史,并以托管各种被禁止的subreddits而闻名。本文介绍了一个数据集,其中有超过230万的提交和1620万个评论,从7.1k Subverses中的113k用户(相当于VOAT的Subreddit)发布了数据集。我们的数据集涵盖了VOAT的整个生命周期,从2013年11月8日(2014年4月)开始的那天开始,直到其关闭(2020年12月25日)为止。据我们所知,这项工作介绍了最大,最完整的VOAT数据集。除了发布该数据集的发布之外,我们还提供了一个初步分析,涵盖了发布活动和日常用户以及该平台上的倒数注册,以便对我们数据集有兴趣的研究人员可以知道会发生什么。当我们分析用户在平台上共享的链接时,我们的数据可能对错误的新闻传播研究有所帮助,发现许多社区依靠Breitbart和Gatewaypundit等替代新闻出版社进行日常讨论。此外,我们对用户互动进行网络分析,发现许多用户不希望与叙事兴趣之外的颠覆性互动,这对专注于两极分化和回声室的研究人员可能会有所帮助。另外,由于VOAT是禁止使用Reddit社区的平台之一,因此我们相信我们的数据集将激励和帮助研究替代成员的研究人员。最后,许多仇恨和阴谋社区在VOAT上非常受欢迎,这使我们的工作对于专注于毒性,阴谋论,社交网络的跨平台研究和自然语言处理的研究人员很有价值。
Voat.co was a news aggregator website that shut down on December 25, 2020. The site had a troubled history and was known for hosting various banned subreddits. This paper presents a dataset with over 2.3M submissions and 16.2M comments posted from 113K users in 7.1K subverses (the equivalent of subreddit for Voat). Our dataset covers the whole lifetime of Voat, from its developing period starting on November 8, 2013, the day it was founded, April 2014, up until the day it shut down (December 25, 2020). This work presents the largest and most complete publicly available Voat dataset, to the best of our knowledge. Along with the release of this dataset, we present a preliminary analysis covering posting activity and daily user and subverse registration on the platform so that researchers interested in our dataset can know what to expect. Our data may prove helpful to false news dissemination studies as we analyze the links users share on the platform, finding that many communities rely on alternative news press, like Breitbart and GatewayPundit, for their daily discussions. In addition, we perform network analysis on user interactions finding that many users prefer not to interact with subverses outside their narrative interests, which could be helpful to researchers focusing on polarization and echo chambers. Also, since Voat was one of the platforms banned Reddit communities migrated to, we are confident our dataset will motivate and assist researchers studying deplatforming. Finally, many hateful and conspiratorial communities were very popular on Voat, which makes our work valuable for researchers focusing on toxicity, conspiracy theories, cross-platform studies of social networks, and natural language processing.