当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SimilCatch: Enhanced social spammers detection on Twitter using Markov Random Fields
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-06-29 , DOI: 10.1016/j.ipm.2020.102317
Nour El-Mawass , Paul Honeine , Laurent Vercouter

The problem of social spam detection has been traditionally modeled as a supervised classification problem. Despite the initial success of this detection approach, later analysis of proposed systems and detection features has shown that, like email spam, the dynamic and adversarial nature of social spam makes the performance achieved by supervised systems hard to maintain. In this paper, we investigate the possibility of using the output of previously proposed supervised classification systems as a tool for spammers discovery. The hypothesis is that these systems are still highly capable of detecting spammers reliably even when their recall is far from perfect. We then propose to use the output of these classifiers as prior beliefs in a probabilistic graphical model framework. This framework allows beliefs to be propagated to similar social accounts. Basing similarity on a who-connects-to-whom network has been empirically critiqued in recent literature and we propose here an alternative definition based on a bipartite users-content interaction graph. For evaluation, we build a Markov Random Field on a graph of similar users and compute prior beliefs using a selection of state-of-the-art classifiers. We apply Loopy Belief Propagation to obtain posterior predictions on users. The proposed system is evaluated on a recent Twitter dataset that we collected and manually labeled. Classification results show a significant increase in recall and a maintained precision. This validates that formulating the detection problem with an undirected graphical model framework permits to restore the deteriorated performances of previously proposed statistical classifiers and to effectively mitigate the effect of spam evolution.



中文翻译:

SimilCatch:使用Markov Random Fields在Twitter上增强了对社交垃圾邮件发送者的检测

传统上,将社交垃圾邮件检测问题建模为监督分类问题。尽管此检测方法取得了最初的成功,但对提议的系统和检测功能的后来分析表明,与电子邮件垃圾邮件一样,社交垃圾邮件的动态和对抗性质使得受监管系统所实现的性能难以维护。在本文中,我们调查了使用先前提出的监督分类系统的输出作为垃圾邮件发送者发现工具的可能性。假设是这些系统即使在召回程度还很差的情况下,仍具有高度可靠地检测垃圾邮件发送者的能力。然后,我们建议使用这些分类器的输出作为概率图形模型框架中的先验信念。该框架允许将信念传播到类似的社会账户。在最近的文献中,已经有人对基于“谁连接到谁”网络的相似性提出了批评,我们在此提出一种基于两方用户-内容交互图的替代定义。为了进行评估,我们在相似用户的图上构建了一个马尔可夫随机场,并使用一系列最新的分类器来计算先验信念。我们应用Loopy Belief传播来获取用户的后验预测。在我们收集并手动标记的最新Twitter数据集上评估了提议的系统。分类结果显示召回率显着提高,并保持了精确度。

更新日期:2020-06-29
down
wechat
bug