Resources and benchmark corpora for hate speech detection: a systematic review,Language Resources and Evaluation

当前位置： X-MOL 学术 › Lang. Resour. Eval. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Resources and benchmark corpora for hate speech detection: a systematic review
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2020-09-30 , DOI: 10.1007/s10579-020-09502-8
Fabio Poletto , Valerio Basile , Manuela Sanguinetti , Cristina Bosco , Viviana Patti

Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.

中文翻译：

仇恨语音检测的资源和基准语料库：系统综述

社交媒体中的仇恨言论是一个复杂的现象，最近的评论工作证明，在自然语言处理社区中，其检测最近引起了广泛关注。考虑到已提出的大量监督方法，带注释的语料库和基准是关键资源。Lexica在仇恨语音检测系统的开发中也起着重要作用。在这篇评论中，我们系统地分析了整个社区提供的资源，包括其开发方法，主题重点，语言覆盖范围和其他因素。我们的分析结果突出了一个异质的，不断增长的景观，其特征是存在多个问题和需要改进的地方。

更新日期：2020-09-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11