Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers,arXiv - CS - Computers and Society

当前位置： X-MOL 学术 › arXiv.cs.CY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers
arXiv - CS - Computers and Society Pub Date : 2020-06-29 , DOI: arxiv-2006.16402
Elizabeth Reichert, Helen Qiu, Jasmine Bayrooti

The censorship of toxic comments is often left to the judgment of imperfect models. Perspective API, a creation of Google technology incubator Jigsaw, is perhaps the most widely used toxicity classifier in industry; the model is employed by several online communities including The New York Times to identify and filter out toxic comments with the goal of preserving online safety. Unfortunately, Google's model tends to unfairly assign higher toxicity scores to comments containing words referring to the identities of commonly targeted groups (e.g., "woman,'' "gay,'' etc.) because these identities are frequently referenced in a disrespectful manner in the training data. As a result, comments generated by marginalized groups referencing their identities are often mistakenly censored. It is important to be cognizant of this unintended bias and strive to mitigate its effects. To address this issue, we have constructed several toxicity classifiers with the intention of reducing unintended bias while maintaining strong classification performance.

中文翻译：

在人口统计界线之间阅读：解决毒性分类器中的偏差来源

对有毒评论的审查往往留给不完善模型的判断。Perspective API 是 Google 技术孵化器 Jigsaw 的产物，可能是业界使用最广泛的毒性分类器；该模型被包括《纽约时报》在内的多个在线社区用于识别和过滤有害评论，以保护在线安全。不幸的是，谷歌的模型倾向于不公平地为包含涉及常见目标群体身份（例如，“女人”、“同性恋”等）的词的评论分配更高的毒性分数，因为这些身份经常以不尊重的方式被引用训练数据。结果，边缘化群体在提及其身份时发表的评论经常被错误地审查。重要的是要认识到这种无意的偏见并努力减轻其影响。为了解决这个问题，我们构建了几个毒性分类器，旨在减少意外偏差，同时保持强大的分类性能。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文