当前位置: X-MOL 学术Language Sciences › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Twitter trolls: a linguistic profile of anti-democratic discourse
Language Sciences ( IF 0.816 ) Pub Date : 2020-01-12 , DOI: 10.1016/j.langsci.2019.101268
Jonas Lundberg , Mikko Laitinen

This article focuses on anti-democratic discourse and investigates the linguistic profile of Twitter trolls. The troll data consist of some 3.5 million messages in English obtained through Twitter in late 2018. These data originate from potentially state-backed information operations aimed at sowing discord in Western societies. The baseline data, against which the troll data are compared, contain circa 4.4 million messages in English drawn from the Nordic Tweet Stream corpus. A machine learning application that enables us to select genuine personal messages in this corpus is used to prune the data. The empirical part investigates frequency-based characteristics of the two datasets. We utilize a set of automatically-extracted word-list information and the observed frequencies of personal pronouns. Our empirical findings show considerable quantitative differences so that the troll data are shorter, make use of a smaller number of lexical types and tokens, and resemble more formal registers, while the personal messages are more spoken-like. The results could be used to improve automated detection systems whose purpose is to identify troll accounts.



中文翻译:

Twitter巨魔:反民主话语的语言学特征

本文重点介绍反民主话语,并研究Twitter巨魔的语言特征。巨魔数据包括2018年底通过Twitter获得的约350万条英语英语消息。这些数据来自潜在的国家支持的信息活动,旨在播种西方社会的不和。比较巨魔数据的基准数据包含从Nordic Tweet Stream语料库中提取的大约440万条英语英语消息。机器学习应用程序使我们能够在该语料库中选择真实的个人消息,用于修剪数据。实证部分研究了两个数据集基于频率的特征。我们利用了一组自动提取的词表信息和观察到的人称代词频率。我们的经验发现显示出数量上的巨大差异,因此巨魔数据更短,使用的词汇类型和标记数量更少,更像正式的登记簿,而个人信息更像口语。该结果可用于改进旨在识别巨魔账户的自动检测系统。

更新日期:2020-01-12
down
wechat
bug