论文标题

探测流行的基于事件的谣言检测基准中的虚假相关性

Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks

论文作者

Wu, Jiaying, Hooi, Bryan

论文摘要

随着社交媒体成为错误信息传播的温床,谣言检测的关键任务已经见证了开源基准数据集促进的有希望的进步。尽管被广泛使用,但我们发现这些数据集遭受了虚假的相关性的困扰,这被现有研究忽略了,并导致对现有谣言检测性能的严重高估。虚假的相关性源于三个原因:(1)基于事件的数据收集和标签方案将相同的准确性标签分配给来自同一基础事件的多个高度相似的帖子; (2)合并多个数据源,虚假地将源身份与真实标签联系起来; (3)标记偏差。在本文中,我们仔细研究了三个最受欢迎的谣言检测基准数据集(即Twitter15,Twitter16和Pheme),并提出了事件分开的谣言检测作为消除虚假提示的解决方案。在事件分离的设置下,我们观察到现有最新模型的准确性大大下降了40%以上,仅与简单的神经分类器相当。为了更好地解决此任务,我们建议出版商样式聚合(PSA),这是一种可普遍的方法,它汇总了发布者发布记录以学习写作风格和真实性立场。广泛的实验表明,我们的方法在有效性,效率和概括性方面优于现有基准。

As social media becomes a hotbed for the spread of misinformation, the crucial task of rumor detection has witnessed promising advances fostered by open-source benchmark datasets. Despite being widely used, we find that these datasets suffer from spurious correlations, which are ignored by existing studies and lead to severe overestimation of existing rumor detection performance. The spurious correlations stem from three causes: (1) event-based data collection and labeling schemes assign the same veracity label to multiple highly similar posts from the same underlying event; (2) merging multiple data sources spuriously relates source identities to veracity labels; and (3) labeling bias. In this paper, we closely investigate three of the most popular rumor detection benchmark datasets (i.e., Twitter15, Twitter16 and PHEME), and propose event-separated rumor detection as a solution to eliminate spurious cues. Under the event-separated setting, we observe that the accuracy of existing state-of-the-art models drops significantly by over 40%, becoming only comparable to a simple neural classifier. To better address this task, we propose Publisher Style Aggregation (PSA), a generalizable approach that aggregates publisher posting records to learn writing style and veracity stance. Extensive experiments demonstrate that our method outperforms existing baselines in terms of effectiveness, efficiency and generalizability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源