论文标题
在社交媒体上对国家赞助的宣传的多模式标识
Multi-modal Identification of State-Sponsored Propaganda on Social Media
论文作者
论文摘要
近年来,互联网上国家赞助的宣传的流行已成为引起人们关注的原因。尽管已经努力确定国家赞助的互联网宣传,但由于宣传的模棱两可的定义导致了不可靠的数据标记,因此问题尚未解决,并且大量的潜在预测功能导致模型无法解释。本文是为此任务构建平衡数据集的首次尝试。该数据集由两个时间段的三个不同组织组成。提出了一个多模型框架,用于仅根据视觉和文本内容来检测宣传消息,该框架在同一时间段内(同一时间段的数据培训和对数据进行培训和测试)(F1 = 0.869)以及对不同时间段的数据进行培训和测试(对过去的培训,对未来的测试)(F1 = 0.697)(F1 = 0.869),在检测三个组织的宣传方面达到了有希望的表现。为了减少假阳性预测的影响,我们更改了阈值,以测试假积极和真实的积极速率之间的关系,并为我们的模型使用可视化工具做出的预测提供了解释,以增强我们的框架的解释性。我们的新数据集和一般框架为确定国家赞助的Internet宣传的任务提供了强大的基准,并指出了未来在此任务上工作的潜在途径。
The prevalence of state-sponsored propaganda on the Internet has become a cause for concern in the recent years. While much effort has been made to identify state-sponsored Internet propaganda, the problem remains far from being solved because the ambiguous definition of propaganda leads to unreliable data labelling, and the huge amount of potential predictive features causes the models to be inexplicable. This paper is the first attempt to build a balanced dataset for this task. The dataset is comprised of propaganda by three different organizations across two time periods. A multi-model framework for detecting propaganda messages solely based on the visual and textual content is proposed which achieves a promising performance on detecting propaganda by the three organizations both for the same time period (training and testing on data from the same time period) (F1=0.869) and for different time periods (training on past, testing on future) (F1=0.697). To reduce the influence of false positive predictions, we change the threshold to test the relationship between the false positive and true positive rates and provide explanations for the predictions made by our models with visualization tools to enhance the interpretability of our framework. Our new dataset and general framework provide a strong benchmark for the task of identifying state-sponsored Internet propaganda and point out a potential path for future work on this task.