当前位置: X-MOL 学术ACM Trans. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
“The Enemy Among Us”
ACM Transactions on the Web ( IF 3.5 ) Pub Date : 2019-07-26 , DOI: 10.1145/3324997
Wafa Alorainy 1 , Pete Burnap 2 , Han Liu 2 , Matthew L. Williams 2
Affiliation  

Offensive or antagonistic language targeted at individuals and social groups based on their personal characteristics (also known as cyber hate speech or cyberhate) has been frequently posted and widely circulated via the World Wide Web. This can be considered as a key risk factor for individual and societal tension surrounding regional instability. Automated Web-based cyberhate detection is important for observing and understanding community and regional societal tension—especially in online social networks where posts can be rapidly and widely viewed and disseminated. While previous work has involved using lexicons, bags-of-words, or probabilistic language parsing approaches, they often suffer from a similar issue, which is that cyberhate can be subtle and indirect—thus, depending on the occurrence of individual words or phrases, can lead to a significant number of false negatives, providing inaccurate representation of the trends in cyberhate. This problem motivated us to challenge thinking around the representation of subtle language use, such as references to perceived threats from “the other” including immigration or job prosperity in a hateful context. We propose a novel “othering” feature set that utilizes language use around the concept of “othering” and intergroup threat theory to identify these subtleties, and we implement a wide range of classification methods using embedding learning to compute semantic distances between parts of speech considered to be part of an “othering” narrative. To validate our approach, we conducted two sets of experiments. The first involved comparing the results of our novel method with state-of-the-art baseline models from the literature. Our approach outperformed all existing methods. The second tested the best performing models from the first phase on unseen datasets for different types of cyberhate, namely religion, disability, race, and sexual orientation. The results showed F-measure scores for classifying hateful instances obtained through applying our model of 0.81, 0.71, 0.89, and 0.72, respectively, demonstrating the ability of the “othering” narrative to be an important part of model generalization.

中文翻译:

“我们中间的敌人”

基于个人特征(也称为网络仇恨言论或网络仇恨)针对个人和社会群体的攻击性或对抗性语言经常通过万维网发布和广泛传播。这可以被认为是围绕地区不稳定的个人和社会紧张局势的关键风险因素。基于 Web 的自动网络仇恨检测对于观察和理解社区和区域社会紧张局势非常重要,尤其是在可以快速广泛地查看和传播帖子的在线社交网络中。虽然以前的工作涉及使用词典、词袋或概率语言解析方法,但它们经常遇到类似的问题,即网络仇恨可能是微妙和间接的——因此,取决于单个单词或短语的出现,可能会导致大量的误报,从而无法准确地反映网络仇恨的趋势。这个问题促使我们挑战对微妙语言使用表示的思考,例如提到来自“他者”的感知威胁,包括在仇恨环境中的移民或工作繁荣。我们提出了一个新颖的“othering”特征集,利用围绕“othering”概念和组间威胁理论的语言使用来识别这些微妙之处,并且我们使用嵌入学习来实现广泛的分类方法,以计算所考虑的词性之间的语义距离成为“他者”叙事的一部分。为了验证我们的方法,我们进行了两组实验。第一个涉及将我们的新方法的结果与文献中最先进的基线模型进行比较。我们的方法优于所有现有方法。第二个测试了第一阶段中表现最好的模型,在看不见的数据集上针对不同类型的网络仇恨,即宗教、残疾、种族和性取向。结果显示,通过应用我们的模型获得的对仇恨实例进行分类的 F 度量分数分别为 0.81、0.71、0.89 和 0.72,证明了“其他”叙述作为模型泛化的重要组成部分的能力。
更新日期:2019-07-26
down
wechat
bug