垃圾邮件活动检测的签名潜在因素

论文标题

垃圾邮件活动检测的签名潜在因素

Signed Latent Factors for Spamming Activity Detection

论文作者

Liu, Yuli

论文摘要

由于在各种在线平台上进行垃圾邮件活动（例如，网络垃圾邮件，欺骗性评论，假关注者等）的趋势不断增加，以获得不当的好处，因此垃圾邮件检测已成为热门研究问题。先前试图打击垃圾邮件的尝试主要采用与元数据，用户行为或关系关系有关的功能。这些研究在理解和过滤垃圾邮件运动方面取得了长足的进步。但是，这个问题尚未完全解决。几乎所有提议的功能都集中在有限数量的观察到的属性或可解释的现象上，因此现有方法很难实现进一步的改进。为了扩大解决垃圾邮件问题的愿景，解决垃圾邮件检测区域中长期存在的挑战（阶级失衡和图形不完整），我们提出了一种新的尝试，以利用签名的潜在因素来过滤欺诈性活动。在这种情况下，多个在线应用程序的垃圾邮件污染的关系数据集由统一签名的网络解释。基于多关系的可能性估计（LFM-MRLE）和签名的成对排名（LFM-SPR），设计了两种潜在因子开采（LFM）模型的竞争性和高度不同的算法。然后，我们探索如何将矿的潜在因素应用于垃圾邮件检测任务。在现实世界中的不同类型的Web应用程序（社交媒体和Web论坛）上进行的实验表明，LFM模型在检测垃圾邮件活动时的表现优于最先进的基线。通过专门操纵实验数据，我们的方法在处理不完整和不平衡挑战方面的有效性得到了验证。

Due to the increasing trend of performing spamming activities (e.g., Web spam, deceptive reviews, fake followers, etc.) on various online platforms to gain undeserved benefits, spam detection has emerged as a hot research issue. Previous attempts to combat spam mainly employ features related to metadata, user behaviors, or relational ties. These studies have made considerable progress in understanding and filtering spamming campaigns. However, this problem remains far from fully solved. Almost all the proposed features focus on a limited number of observed attributes or explainable phenomena, making it difficult for existing methods to achieve further improvement. To broaden the vision about solving the spam problem and address long-standing challenges (class imbalance and graph incompleteness) in the spam detection area, we propose a new attempt of utilizing signed latent factors to filter fraudulent activities. The spam-contaminated relational datasets of multiple online applications in this scenario are interpreted by the unified signed network. Two competitive and highly dissimilar algorithms of latent factors mining (LFM) models are designed based on multi-relational likelihoods estimation (LFM-MRLE) and signed pairwise ranking (LFM-SPR), respectively. We then explore how to apply the mined latent factors to spam detection tasks. Experiments on real-world datasets of different kinds of Web applications (social media and Web forum) indicate that LFM models outperform state-of-the-art baselines in detecting spamming activities. By specifically manipulating experimental data, the effectiveness of our methods in dealing with incomplete and imbalanced challenges is validated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题