当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach
Information Processing & Management ( IF 7.4 ) Pub Date : 2021-06-03 , DOI: 10.1016/j.ipm.2021.102643
Jan Kocoń , Alicja Figas , Marcin Gruza , Daria Puchalska , Tomasz Kajdanowicz , Przemysław Kazienko

Analysis of subjective texts like offensive content or hate speech is a great challenge, especially regarding annotation process. Most of current annotation procedures are aimed at achieving a high level of agreement in order to generate a high quality reference source. However, the annotation guidelines for subjective content may restrict the annotators’ freedom of decision making. Motivated by a moderate annotation agreement in offensive content datasets, we hypothesize that personalized approaches to offensive content identification should be in place. Thus, we propose two novel perspectives of perception: group-based and individual. Using demographics of annotators as well as embeddings of their previous decisions (annotated texts), we are able to train multimodal models (including transformer-based) adjusted to personal or community profiles. Based on the agreement of individuals and groups, we experimentally showed that annotator group agreeability strongly correlates with offensive content recognition quality. The proposed personalized approaches enabled us to create models adaptable to personal user beliefs rather than to agreed offensiveness understanding. Overall, our individualized approaches to offensive content classification outperform classic data-centric methods that generalize offensiveness perception and it refers to all six tested models. Additionally, we developed requirements for annotation procedures, personalization and content processing to make the solutions human-centered.



中文翻译:

攻击性、攻击性和仇恨言论分析:从以数据为中心到以人为中心的方法

分析诸如攻击性内容或仇恨言论之类的主观文本是一项巨大的挑战,尤其是在注释过程中。大多数当前的注释程序旨在实现高水平的一致性,以生成高质量的参考源。但是,主观内容的注释指南可能会限制注释者的决策自由。受攻击性内容数据集中适度注释协议的推动,我们假设应该采用个性化的攻击性内容识别方法。因此,我们提出了两种新颖的感知视角:基于群体的和基于个体的。使用注释者的人口统计数据以及他们之前决定的嵌入(带注释的文本),我们能够训练多模态模型(包括基于变压器的),以适应个人或社区概况。基于个人和群体的一致性,我们通过实验表明,注释者群体的一致性与攻击性内容识别质量密切相关。提出的个性化方法使我们能够创建适应个人用户信念而不是商定的攻击性理解的模型。总体而言,我们针对攻击性内容分类的个性化方法优于以数据为中心的经典方法,这些方法可以概括攻击性感知,并且它涉及所有六个测试模型。此外,我们还制定了注释程序、个性化和内容处理的要求,使解决方案以人为本。提出的个性化方法使我们能够创建适应个人用户信念而不是商定的攻击性理解的模型。总体而言,我们针对攻击性内容分类的个性化方法优于以数据为中心的经典方法,这些方法可以概括攻击性感知,并且它涉及所有六个测试模型。此外,我们还制定了注释程序、个性化和内容处理的要求,使解决方案以人为本。提出的个性化方法使我们能够创建适应个人用户信念而不是商定的攻击性理解的模型。总体而言,我们针对攻击性内容分类的个性化方法优于以数据为中心的经典方法,这些方法可以概括攻击性感知,并且它涉及所有六个测试模型。此外,我们还制定了注释程序、个性化和内容处理的要求,使解决方案以人为本。

更新日期:2021-06-03
down
wechat
bug