Feature selection using Benford’s law to support detection of malicious social media bots,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature selection using Benford’s law to support detection of malicious social media bots
Information Sciences ( IF 8.1 ) Pub Date : 2021-09-15 , DOI: 10.1016/j.ins.2021.09.038
Innocent Mbona ₁ , Jan H.P. Eloff ₁

Affiliation

The increased amount of high-dimensional imbalanced data in online social networks challenges existing feature selection methods. Although feature selection methods such as principal component analysis (PCA) are effective for solving high-dimensional imbalanced data problems, they can be computationally expensive. Hence, an effortless approach for identifying meaningful features that are indicative of anomalous behaviour between humans and malicious bots is presented herein. The most recent Twitter dataset that encompasses the behaviour of various types of malicious bots (including fake followers, retweet spam, fake advertisements, and traditional spambots) is used to understand the behavioural traits of such bots. The approach is based on Benford’s law for predicting the frequency distribution of significant leading digits. This study demonstrates that features closely obey Benford’s law on a human dataset, whereas the same features violate Benford’s law on a malicious bot dataset. Finally, it is demonstrated that the features identified by Benford’s law are consistent with those identified via PCA and the ensemble random forest method on the same datasets. This study contributes to the intelligent detection of malicious bots such that their malicious activities, such as the dissemination of spam, can be minimised.

中文翻译：

使用 Benford 定律进行特征选择以支持检测恶意社交媒体机器人

在线社交网络中高维不平衡数据量的增加对现有的特征选择方法提出了挑战。尽管主成分分析 (PCA) 等特征选择方法对于解决高维不平衡数据问题很有效，但它们的计算成本可能很高。因此，本文提出了一种用于识别指示人类和恶意机器人之间异常行为的有意义特征的轻松方法。最新的Twitter 数据集它包含各种类型的恶意机器人（包括假粉丝、转发垃圾邮件、虚假广告和传统垃圾邮件机器人）的行为，用于了解此类机器人的行为特征。该方法基于 Benford 定律来预测有效前导数字的频率分布。这项研究表明，特征在人类数据集上严格遵守 Benford 定律，而在恶意机器人数据集上，相同特征违反 Benford 定律。最后，证明了 Benford 定律识别的特征与相同数据集上通过 PCA 和集成随机森林方法识别的特征一致。这项研究有助于智能检测恶意机器人，从而最大限度地减少其恶意活动，例如传播垃圾邮件。

更新日期：2021-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>