Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems,The VLDB Journal

当前位置： X-MOL 学术 › VLDB J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems
The VLDB Journal ( IF 2.8 ) Pub Date : 2021-05-05 , DOI: 10.1007/s00778-021-00671-8
Agathe Balayn , Christoph Lofi , Geert-Jan Houben

The increasing use of data-driven decision support systems in industry and governments is accompanied by the discovery of a plethora of bias and unfairness issues in the outputs of these systems. Multiple computer science communities, and especially machine learning, have started to tackle this problem, often developing algorithmic solutions to mitigate biases to obtain fairer outputs. However, one of the core underlying causes for unfairness is bias in training data which is not fully covered by such approaches. Especially, bias in data is not yet a central topic in data engineering and management research. We survey research on bias and unfairness in several computer science domains, distinguishing between data management publications and other domains. This covers the creation of fairness metrics, fairness identification, and mitigation methods, software engineering approaches and biases in crowdsourcing activities. We identify relevant research gaps and show which data management activities could be repurposed to handle biases and which ones might reinforce such biases. In the second part, we argue for a novel data-centered approach overcoming the limitations of current algorithmic-centered methods. This approach focuses on eliciting and enforcing fairness requirements and constraints on data that systems are trained, validated, and used on. We argue for the need to extend database management systems to handle such constraints and mitigation methods. We discuss the associated future research directions regarding algorithms, formalization, modelling, users, and systems.

中文翻译：

管理数据中的偏见和不公平性以提供决策支持：对机器学习和数据工程方法的调查，以识别和减轻数据管理和分析系统中的偏见和不公平性

在行业和政府中越来越多地使用数据驱动的决策支持系统，同时在这些系统的输出中发现大量偏见和不公平问题。多个计算机科学社区，尤其是机器学习社区，已经开始解决这个问题，经常开发算法解决方案来减轻偏差以获得更公平的输出。但是，造成不公平的根本原因之一是培训数据存在偏见，但这种方法并未完全涵盖其中。特别是，数据偏差还不是数据工程和管理研究的中心话题。我们调查了几个计算机科学领域对偏见和不公平的研究，以区分数据管理出版物和其他领域。这涵盖了公平性指标的创建，公平性标识和缓解方法，软件工程方法和众包活动中的偏见。我们确定了相关的研究差距，并表明可以将哪些数据管理活动重新用于应对偏见，哪些可以加强这种偏见。在第二部分中，我们提出了一种新颖的以数据为中心的方法，以克服当前以算法为中心的方法的局限性。这种方法着重于对系统经过训练，验证和使用的数据提出和执行公平性要求和约束。我们认为有必要扩展数据库管理系统以处理此类约束和缓解方法。我们讨论有关算法，形式化，建模，用户和系统的相关未来研究方向。我们确定了相关的研究差距，并表明可以将哪些数据管理活动重新用于应对偏见，哪些可以加强这种偏见。在第二部分中，我们提出了一种新颖的以数据为中心的方法，以克服当前以算法为中心的方法的局限性。这种方法着重于对系统经过训练，验证和使用的数据提出和执行公平性要求和约束。我们认为有必要扩展数据库管理系统以处理此类约束和缓解方法。我们讨论有关算法，形式化，建模，用户和系统的相关未来研究方向。我们确定了相关的研究差距，并表明可以将哪些数据管理活动重新用于应对偏见，哪些可以加强这种偏见。在第二部分中，我们提出了一种新颖的以数据为中心的方法，以克服当前以算法为中心的方法的局限性。这种方法着重于对系统经过培训，验证和使用的数据提出和执行公平性要求和约束。我们认为有必要扩展数据库管理系统以处理此类约束和缓解方法。我们讨论有关算法，形式化，建模，用户和系统的相关未来研究方向。我们主张一种新颖的以数据为中心的方法，以克服当前以算法为中心的方法的局限性。这种方法着重于对系统经过训练，验证和使用的数据提出和执行公平性要求和约束。我们认为有必要扩展数据库管理系统以处理此类约束和缓解方法。我们讨论有关算法，形式化，建模，用户和系统的相关未来研究方向。我们主张一种新颖的以数据为中心的方法，以克服当前以算法为中心的方法的局限性。这种方法着重于对系统经过培训，验证和使用的数据提出和执行公平性要求和约束。我们认为有必要扩展数据库管理系统以处理此类约束和缓解方法。我们讨论有关算法，形式化，建模，用户和系统的相关未来研究方向。

更新日期：2021-05-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文