当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A FAIR and AI-ready Higgs Boson Decay Dataset
arXiv - CS - Databases Pub Date : 2021-08-04 , DOI: arxiv-2108.02214
Yifan Chen, E. A. Huerta, Javier Duarte, Philip Harris, Daniel S. Katz, Mark S. Neubauer, Daniel Diaz, Farouk Mokhtar, Raghav Kansal, Sang Eon Park, Volodymyr V. Kindratenko, Zhizhen Zhao, Roger Rusack

To enable the reusability of massive scientific datasets by humans and machines, researchers aim to create scientific datasets that adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets each FAIR principle. We then demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We also use other available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to facilitate an understanding and exploration of the dataset, including visualization of its elements. This study marks the first in a planned series of articles that will guide scientists in the creation and quantification of FAIRness in high energy particle physics datasets and AI models.

中文翻译:

一个公平且支持 AI 的希格斯玻色子衰变数据集

为了实现人类和机器对大量科学数据集的可重用性,研究人员旨在创建符合数据和人工智能 (AI) 模型的可查找性、可访问性、互操作性和可重用性 (FAIR) 原则的科学数据集。本文提供了一个与领域无关的分步评估指南,以评估给定的数据集是否符合每个 FAIR 原则。然后,我们将演示如何使用本指南评估由 CERN 大型强子对撞机的 CMS 协作生成的开放模拟数据集的公平性。该数据集由希格斯玻色子衰变以及夸克和胶子背景组成,可通过欧洲核子研究中心开放数据门户获得。我们还使用其他可用工具来评估该数据集的公平性,并结合 FAIR 社区成员的反馈来验证我们的结果。本文附有 Jupyter 笔记本,以促进对数据集的理解和探索,包括其元素的可视化。这项研究标志着计划中的系列文章中的第一篇,这些文章将指导科学家在高能粒子物理数据集和 AI 模型中创建和量化 FAIRness。
更新日期:2021-08-07
down
wechat
bug