Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards
arXiv - CS - Databases Pub Date : 2021-08-16 , DOI: arxiv-2108.07374
Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan Sasanka Ammanamanchi, Sebastian Gehrmann, Yacine Jernite

Developing documentation guidelines and easy-to-use templates for datasets and models is a challenging task, especially given the variety of backgrounds, skills, and incentives of the people involved in the building of natural language processing (NLP) tools. Nevertheless, the adoption of standard documentation practices across the field of NLP promotes more accessible and detailed descriptions of NLP datasets and models, while supporting researchers and developers in reflecting on their work. To help with the standardization of documentation, we present two case studies of efforts that aim to develop reusable documentation templates -- the HuggingFace data card, a general purpose card for datasets in NLP, and the GEM benchmark data and model cards with a focus on natural language generation. We describe our process for developing these templates, including the identification of relevant stakeholder groups, the definition of a set of guiding principles, the use of existing templates as our foundation, and iterative revisions based on feedback.

中文翻译：

用于记录自然语言处理和生成的数据集和模型的可重用模板和指南：HuggingFace 和 GEM 数据和模型卡的案例研究

为数据集和模型开发文档指南和易于使用的模板是一项具有挑战性的任务，特别是考虑到参与构建自然语言处理 (NLP) 工具的人员的背景、技能和动机各不相同。尽管如此，在 NLP 领域采用标准文档实践促进了对 NLP 数据集和模型的更易于访问和更详细的描述，同时支持研究人员和开发人员反思他们的工作。为了帮助文档的标准化，我们展示了两个旨在开发可重用文档模板的工作案例研究——HuggingFace 数据卡，一种用于 NLP 数据集的通用卡，以及 GEM 基准数据和模型卡，重点是自然语言生成。我们描述了我们开发这些模板的过程，

更新日期：2021-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>