Supervised Learning Methods in Classifying Organized Behavior in Tweet Collections,International Journal on Artificial Intelligence Tools

当前位置： X-MOL 学术 › Int. J. Artif. Intell. Tools › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Supervised Learning Methods in Classifying Organized Behavior in Tweet Collections
International Journal on Artificial Intelligence Tools ( IF 1.0 ) Pub Date : 2019-10-01 , DOI: 10.1142/s0218213019600017
Erdem Beğenilmiş ₁ , Susan Uskudarli ₂

Affiliation

The successful use of social media to manipulate public opinion via bots and hired individuals to spread (mis)information to unsuspecting users reached alarming levels due to the manipulations during the 2016 US elections and the Brexit deliberations in the UK. Fake interaction such as “liking” and “retweeting” are staged to foster trust in the posts of bots and individuals, which makes it difficult for individuals to detect the posts that are part of greater schemes. We propose an approach based on supervised learning to classify collections of tweets as “organized” when they inhabit premeditated intent and as “organic” otherwise. Features related to users and posting behavior are used to train the classifiers using 851 data sets totaling above 270 million tweets. Further classifiers are trained to assess the effectiveness of the selected features. The random forest algorithm persistently yielded the best results with scores greater than 95% for both accuracy and f-measure. For comparison purposes, unsupervised learning methods were used to cluster the same data sets. The Gaussian Mixture Model clustered [organized vs organic] data set with 99% agreement with the labels. The success of using only behavioral features to detect organized behavior is encouraging.

中文翻译：

推文集合中的有组织行为分类中的监督学习方法

由于 2016 年美国大选和英国脱欧审议期间的操纵，成功利用社交媒体通过机器人操纵公众舆论，并雇佣个人向毫无戒心的用户传播（错误）信息，达到了令人震惊的程度。诸如“喜欢”和“转发”之类的虚假互动是为了培养对机器人和个人帖子的信任，这使得个人很难发现属于更大计划的帖子。我们提出了一种基于监督学习的方法，将推文集合分类为“有组织的”，当它们包含有预谋的意图时，将它们分类为“有机的”。使用与用户和发布行为相关的特征来训练分类器，使用 851 个数据集，总计超过 2.7 亿条推文。训练进一步的分类器以评估所选特征的有效性。随机森林算法持续产生最佳结果，准确率和 f 度量的得分均大于 95%。出于比较目的，使用无监督学习方法对相同的数据集进行聚类。高斯混合模型将 [有组织与有机] 数据集聚类，与标签的一致性为 99%。仅使用行为特征来检测有组织的行为的成功令人鼓舞。

更新日期：2019-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11