当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Datasheets for Datasets
arXiv - CS - Databases Pub Date : 2018-03-23 , DOI: arxiv-1803.09010
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daum\'e III, and Kate Crawford

The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.

中文翻译:

数据集的数据表

机器学习社区目前没有记录数据集的标准化流程,这可能会导致高风险领域的严重后果。为了弥补这一差距,我们提出了数据集的数据表。在电子行业中,每个组件,无论多么简单或复杂,都附有描述其操作特性、测试结果、推荐用途和其他信息的数据表。以此类推,我们建议每个数据集都附有一个数据表,记录其动机、组成、收集过程、推荐用途等。数据集的数据表将促进数据集创建者和数据集消费者之间更好的沟通,并鼓励机器学习社区优先考虑透明度和问责制。
更新日期:2020-03-20
down
wechat
bug