当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
Information Processing & Management ( IF 7.4 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.ipm.2021.102544
Endang Wahyu Pamungkas , Valerio Basile , Viviana Patti

Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting. In this work, we explore hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. We experiment with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. We also evaluate the impact of additional knowledge in our experiment, by incorporating information from a multilingual lexicon of abusive words. The results show that our joint-learning models achieve the best performance on most languages. However, a simple approach that uses machine translation and a pre-trained English language model achieves a robust performance. In contrast, Multilingual BERT fails to obtain a good performance in cross-lingual hate speech detection. We also experimentally found that the external knowledge from a multilingual abusive lexicon is able to improve the models’ performance, specifically in detecting the positive class. The results of our experimental evaluation highlight a number of challenges and issues in this particular task. One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remain an open problem. However, our experimental evaluation and our qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue.



中文翻译:

零知识跨语言仇恨语音检测的知识注入联合学习方法

在数字通信时代,仇恨言论已成为一个日益重要的社会问题。仇恨性表达通常会使用比喻性语言,尽管它们在某种意义上代表了语言的阴暗面,但它们通常还是创造性使用语言的主要例证。尽管仇恨言论是一种全球现象,但有关自动仇恨语音检测的最新研究通常以单语设置。在这项工作中,我们通过零资源学习的方式从资源丰富的语言(英语)中转移知识来探索低资源语言中的仇恨语音检测。我们对传统和最近的神经体系结构进行了实验,并提出了两种联合学习模型,使用不同的多语言表示形式在成对的语言之间传递知识。我们还通过结合来自多语辱骂词词典的信息来评估实验中其他知识的影响。结果表明,我们的联合学习模型在大多数语言上都达到了最佳性能。但是,使用机器翻译和预先训练的英语语言模型的简单方法可以实现稳定的性能。相反,多语言BERT在跨语言的仇恨语音检测中无法获得良好的性能。我们还通过实验发现,来自多语言辱骂词典的外部知识能够改善模型的性能,特别是在检测阳性类别方面。我们的实验评估结果突显了此特定任务中的许多挑战和问题。主要挑战之一与仇恨语音检测的当前基准问题有关,尤其是与数据集中的主题重点相关的偏见如何影响分类性能。当前的多语言语言模型在特定的仇恨语音检测任务中在语言之间传递知识的能力不足仍然是一个未解决的问题。但是,我们的实验评估和定性分析表明,结构化的辱骂性语言词典对语言知识的显式整合如何帮助缓解这一问题。当前的多语言语言模型在特定的仇恨语音检测任务中在语言之间传递知识的能力不足仍然是一个未解决的问题。但是,我们的实验评估和定性分析表明,结构化的辱骂性语言词典对语言知识的显式整合如何帮助缓解这一问题。当前的多语言语言模型在特定的仇恨语音检测任务中在语言之间传递知识的能力不足仍然是一个未解决的问题。但是,我们的实验评估和定性分析表明,结构化的辱骂性语言词典对语言知识的显式整合如何帮助缓解这一问题。

更新日期:2021-03-01
down
wechat
bug