Automatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text?,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text?
Natural Language Engineering ( IF 2.5 ) Pub Date : 2020-11-18 , DOI: 10.1017/s135132492000056x
Gilles Jacobs ₁ , Cynthia Van Hee ₂ , Véronique Hoste ₂

Affiliation

Successful prevention of cyberbullying depends on the adequate detection of harmful messages. Given the impossibility of human moderation on the Social Web, intelligent systems are required to identify clues of cyberbullying automatically. Much work on cyberbullying detection focuses on detecting abusive language without analyzing the severity of the event nor the participants involved. Automatic analysis of participant roles in cyberbullying traces enables targeted bullying prevention strategies. In this paper, we aim to automatically detect different participant roles involved in textual cyberbullying traces, including bullies, victims, and bystanders. We describe the construction of two cyberbullying corpora (a Dutch and English corpus) that were both manually annotated with bullying types and participant roles and we perform a series of multiclass classification experiments to determine the feasibility of text-based cyberbullying participant role detection. The representative datasets present a data imbalance problem for which we investigate feature filtering and data resampling as skew mitigation techniques. We investigate the performance of feature-engineered single and ensemble classifier setups as well as transformer-based pretrained language models (PLMs). Cross-validation experiments revealed promising results for the detection of cyberbullying roles using PLM fine-tuning techniques, with the best classifier for English (RoBERTa) yielding a macro-averaged

${F_1}$

-score of 55.84%, and the best one for Dutch (RobBERT) yielding an

${F_1}$

-score of 56.73%. Experiment replication data and source code are available at https://osf.io/nb2r3.

中文翻译：

网络欺凌中参与者角色的自动分类：我们能否在社交媒体文本中检测到受害者、欺凌者和旁观者？

成功预防网络欺凌取决于对有害信息的充分检测。鉴于社交网络上不可能进行人类节制，智能系统需要自动识别网络欺凌的线索。许多关于网络欺凌检测的工作都集中在检测辱骂性语言上，而不分析事件的严重性或所涉及的参与者。自动分析网络欺凌痕迹中的参与者角色，可实现有针对性的欺凌预防策略。在本文中，我们旨在自动检测涉及文本网络欺凌痕迹的不同参与者角色，包括欺凌者、受害者和旁观者。我们描述了两个网络欺凌语料库（荷兰语和英语语料库）的构建，它们都用欺凌类型和参与者角色进行了手动注释，并且我们执行了一系列多类分类实验以确定基于文本的网络欺凌参与者角色检测的可行性。代表性数据集提出了一个数据不平衡问题，我们研究了特征过滤和数据重采样作为偏差缓解技术。我们研究了特征工程单一和集成分类器设置以及基于转换器的预训练语言模型 (PLM) 的性能。交叉验证实验揭示了使用 PLM 微调技术检测网络欺凌角色的有希望的结果，英语的最佳分类器 (RoBERTa) 产生了宏观平均代表性数据集提出了一个数据不平衡问题，我们研究了特征过滤和数据重采样作为偏差缓解技术。我们研究了特征工程单一和集成分类器设置以及基于转换器的预训练语言模型 (PLM) 的性能。交叉验证实验揭示了使用 PLM 微调技术检测网络欺凌角色的有希望的结果，英语的最佳分类器 (RoBERTa) 产生了宏观平均代表性数据集提出了一个数据不平衡问题，我们研究了特征过滤和数据重采样作为偏差缓解技术。我们研究了特征工程单一和集成分类器设置以及基于转换器的预训练语言模型 (PLM) 的性能。交叉验证实验揭示了使用 PLM 微调技术检测网络欺凌角色的有希望的结果，英语的最佳分类器 (RoBERTa) 产生了宏观平均

${F_1}$

- 得分为 55.84%，是荷兰语 (RobBERT) 中最好的得分

${F_1}$

- 得分为 56.73%。实验复制数据和源代码可在https://osf.io/nb2r3.

更新日期：2020-11-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南