论文标题
利用用户的社交网络嵌入在Twitter上进行虚假新闻检测
Leveraging Users' Social Network Embeddings for Fake News Detection on Twitter
论文作者
论文摘要
对于许多人来说,社交网络(SNS)越来越重要。用户建立的在线连接允许信息比传统新闻媒体(例如报纸,电视)更容易传播。但是,它们还使假新闻的传播比传统媒体更容易,尤其是通过用户的社交网络连接。在本文中,我们专注于调查SNS用户连接结构是否可以在Twitter上有助于假新闻检测。特别是,我们建议在Twitter平台上根据其追随者或友谊网络嵌入用户,以识别用户形成的组。的确,通过在Twitter用户的社交网络连接的图表上应用无监督的图形嵌入方法,我们观察到,与仅在事实新闻中参与的用户相比,与虚假新闻的用户更加紧密地聚集在一起。因此,我们假设嵌入式用户的网络可以有效地检测假新闻。通过使用公开可用的Twitter数据集进行广泛的实验,我们的结果表明,使用用户连接作为网络信息,在SNS上应用图形嵌入方法确实可以比大多数基于语言的方法更有效地对假新闻进行分类。具体而言,我们观察到仅使用文本信息(即TF.IDF或BERT语言模型),以及在部署高级文本功能(即姿态检测)和复杂的网络功能(例如用户网络,发布者,发布者交叉引用)的模型上,观察到了显着改进。我们得出的结论是,Twitter用户的友谊和追随者网络信息可以极大地超过基于语言的方法,以及在Twitter上对假新闻进行分类的现有最新的假新闻检测模型。
Social networks (SNs) are increasingly important sources of news for many people. The online connections made by users allows information to spread more easily than traditional news media (e.g., newspaper, television). However, they also make the spread of fake news easier than in traditional media, especially through the users' social network connections. In this paper, we focus on investigating if the SNs' users connection structure can aid fake news detection on Twitter. In particular, we propose to embed users based on their follower or friendship networks on the Twitter platform, so as to identify the groups that users form. Indeed, by applying unsupervised graph embedding methods on the graphs from the Twitter users' social network connections, we observe that users engaged with fake news are more tightly clustered together than users only engaged in factual news. Thus, we hypothesise that the embedded user's network can help detect fake news effectively. Through extensive experiments using a publicly available Twitter dataset, our results show that applying graph embedding methods on SNs, using the user connections as network information, can indeed classify fake news more effectively than most language-based approaches. Specifically, we observe a significant improvement over using only the textual information (i.e., TF.IDF or a BERT language model), as well as over models that deploy both advanced textual features (i.e., stance detection) and complex network features (e.g., users network, publishers cross citations). We conclude that the Twitter users' friendship and followers network information can significantly outperform language-based approaches, as well as the existing state-of-the-art fake news detection models that use a more sophisticated network structure, in classifying fake news on Twitter.