当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dataset Definition Standard (DDS)
arXiv - CS - Databases Pub Date : 2021-01-07 , DOI: arxiv-2101.03020
Cyril Cappi, Camille Chapdelaine, Laurent Gardes, Eric Jenn, Baptiste Lefevre, Sylvaine Picard, Thomas Soumarmon

This document gives a set of recommendations to build and manipulate the datasets used to develop and/or validate machine learning models such as deep neural networks. This document is one of the 3 documents defined in [1] to ensure the quality of datasets. This is a work in progress as good practices evolve along with our understanding of machine learning. The document is divided into three main parts. Section 2 addresses the data collection activity. Section 3 gives recommendations about the annotation process. Finally, Section 4 gives recommendations concerning the breakdown between train, validation, and test datasets. In each part, we first define the desired properties at stake, then we explain the objectives targeted to meet the properties, finally we state the recommendations to reach these objectives.

中文翻译:

数据集定义标准(DDS)

该文档提供了一组建议,用于构建和操纵用于开发和/或验证机器学习模型(例如深度神经网络)的数据集。该文档是[1]中定义的3个文档之一,以确保数据集的质量。随着良好实践的发展以及我们对机器学习的理解,这是一项正在进行的工作。该文档分为三个主要部分。第2节介绍了数据收集活动。第3节提供有关注释过程的建议。最后,第4节给出了有关训练,验证和测试数据集之间的细分的建议。在每个部分中,我们首先定义所需的属性,然后解释实现这些属性的目标,最后陈述实现这些目标的建议。
更新日期:2021-01-11
down
wechat
bug