当前位置: X-MOL 学术arXiv.cs.CY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Biases in Data Science Lifecycle
arXiv - CS - Computers and Society Pub Date : 2020-09-10 , DOI: arxiv-2009.09795
Dinh-An Ho and Oya Beyan

In recent years, data science has become an indispensable part of our society. Over time, we have become reliant on this technology because of its opportunity to gain value and new insights from data in any field - business, socializing, research and society. At the same time, it raises questions about how justified we are in placing our trust in these technologies. There is a risk that such powers may lead to biased, inappropriate or unintended actions. Therefore, ethical considerations which might occur as the result of data science practices should be carefully considered and these potential problems should be identified during the data science lifecycle and mitigated if possible. However, a typical data scientist has not enough knowledge for identifying these challenges and it is not always possible to include an ethics expert during data science production. The aim of this study is to provide a practical guideline to data scientists and increase their awareness. In this work, we reviewed different sources of biases and grouped them under different stages of the data science lifecycle. The work is still under progress. The aim of early publishing is to collect community feedback and improve the curated knowledge base for bias types and solutions.

中文翻译:

数据科学生命周期中的偏见

近年来,数据科学已成为我们社会不可或缺的一部分。随着时间的推移,我们越来越依赖这项技术,因为它有机会从任何领域(商业、社交、研究和社会)的数据中获得价值和新见解。与此同时,它引发了关于我们信任这些技术的合理性的问题。此类权力有可能导致有偏见、不当或意外行为的风险。因此,应仔细考虑可能因数据科学实践而产生的伦理考虑,并应在数据科学生命周期中识别这些潜在问题,并在可能的情况下予以缓解。然而,典型的数据科学家没有足够的知识来识别这些挑战,并且在数据科学生产过程中并不总是可以包括道德专家。这项研究的目的是为数据科学家提供实用指南并提高他们的意识。在这项工作中,我们审查了不同的偏见来源,并将它们归入数据科学生命周期的不同阶段。这项工作仍在进行中。早期发布的目的是收集社区反馈并改进针对偏见类型和解决方案的策划知识库。
更新日期:2020-10-28
down
wechat
bug