当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Online Multilingual Hate speech Recognition System
arXiv - CS - Computation and Language Pub Date : 2020-11-23 , DOI: arxiv-2011.11523
Neeraj Vashistha, Arkaitz Zubiaga

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.

中文翻译:

在线多语言仇恨语音识别系统

在过去的二十年中,互联网和社交媒体的使用呈指数增长,改变了人类之间的互动。这带来了许多积极成果,但同时也带来了风险和危害。尽管人类无法控制在线上有害内容(例如仇恨言论)的数量,但学术界对研究自动检测仇恨言论的手段的兴趣有所增加。在这项研究中,我们通过将六个可公开获得的数据集合并为一个单一的同类数据集,并将它们分为滥用,可恨或都不可分为三类进行分析。我们创建一个基准模型,并使用各种优化技术来提高模型性能得分。取得竞争表现得分后,我们创建了一个工具,该工具可以近乎实时地识别和评分具有有效指标的网页,并将其用作反馈来重新训练我们的模型。我们在两种语言(英语和北印度语)上证明了我们的多语言模型的竞争性能,从而导致了与大多数单语言模型相当或更好的性能。
更新日期:2020-11-25
down
wechat
bug