Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations
arXiv - CS - Artificial Intelligence Pub Date : 2020-01-15 , DOI: arxiv-2001.05495
Pinkesh Badjatiya, Manish Gupta, Vasudeva Varma

With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ~96k and a Twitter dataset of size ~24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks

中文翻译：

使用基于知识的概括去除仇恨语音检测任务的刻板偏见

随着仇恨在社交媒体平台上传播的案例不断增加，设计滥用检测机制以主动避免和控制此类事件至关重要。虽然存在仇恨言论检测的方法，但它们对单词有刻板印象，因此受到固有偏见训练的影响。传统上一直在研究结构化数据集的偏差消除，但我们的目标是减轻非结构化文本数据的偏差。在本文中，我们做出了两个重要贡献。首先，我们系统地设计了量化任何模型偏差的方法，并提出了用于识别模型刻板印象的词集的算法。其次，我们提出了利用基于知识的概括进行无偏见学习的新方法。基于知识的泛化提供了一种有效的知识编码方式，因为它们提供的抽象不仅可以泛化内容，还有助于从仇恨言论检测分类器中收回信息，从而减少不平衡。我们试验了多种知识泛化策略，并分析了它们对一般性能和减轻偏见的影响。我们在两个真实世界的数据集上进行的实验，一个大小为 ~96k 的 Wikipedia Talk Pages 数据集 (WikiDetox) 和一个大小为 ~24k 的 Twitter 数据集，表明使用基于知识的泛化可以通过强制分类器学习来获得更好的性能概括的内容。我们的方法利用现有的知识库，可以轻松扩展到其他任务我们试验了多种知识泛化策略，并分析了它们对一般性能和减轻偏见的影响。我们在两个真实世界的数据集上进行的实验，一个大小为 ~96k 的 Wikipedia Talk Pages 数据集 (WikiDetox) 和一个大小为 ~24k 的 Twitter 数据集，表明使用基于知识的泛化可以通过强制分类器学习来获得更好的性能概括的内容。我们的方法利用现有的知识库，可以轻松扩展到其他任务我们试验了多种知识泛化策略，并分析了它们对一般性能和减轻偏见的影响。我们在两个真实世界的数据集上进行的实验，一个大小为 ~96k 的 Wikipedia Talk Pages 数据集 (WikiDetox) 和一个大小为 ~24k 的 Twitter 数据集，表明使用基于知识的泛化可以通过强制分类器学习来获得更好的性能概括的内容。我们的方法利用现有的知识库，可以轻松扩展到其他任务

更新日期：2020-01-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文