Tracking materials science data lineage to manage millions of materials experiments and analyses,npj Computational Materials

当前位置： X-MOL 学术 › npj Comput. Mater. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tracking materials science data lineage to manage millions of materials experiments and analyses
npj Computational Materials ( IF 9.7 ) Pub Date : 2019-07-26 , DOI: 10.1038/s41524-019-0216-x
Edwin Soedarmadji , Helge S. Stein , Santosh K. Suram , Dan Guevarra , John M. Gregoire

In an era of rapid advancement of algorithms that extract knowledge from data, data and metadata management are increasingly critical to research success. In materials science, there are few examples of experimental databases that contain many different types of information, and compared with other disciplines, the database sizes are relatively small. Underlying these issues are the challenges in managing and linking data across disparate synthesis and characterization experiments, which we address with the development of a lightweight data management framework that is generally applicable for experimental science and beyond. Five years of managing experiments with this system has yielded the Materials Experiment and Analysis Database (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments, as well as the analysis and distillation of that data into property and performance metrics via software in an accompanying open source repository. The unprecedented quantity and diversity of experimental data are searchable by experiment and analysis attributes generated by both researchers and data processing software. The search web interface allows users to visualize their search results and download zipped packages of data with full annotations of their lineage. The enormity of the data provides substantial challenges and opportunities for incorporating data science in the physical sciences, and MEAD’s data and algorithm management framework will foster increased incorporation of automation and autonomous discovery in materials and chemistry research.

中文翻译：

跟踪材料科学数据沿袭来管理数百万种材料的实验和分析

在从数据中提取知识的算法飞速发展的时代，数据和元数据管理对于研究成功越来越重要。在材料科学中，很少有包含许多不同类型信息的实验数据库示例，并且与其他学科相比，数据库的规模相对较小。这些问题的根本在于跨不同的合成和表征实验管理和链接数据方面的挑战，我们将通过开发轻量级的数据管理框架来应对这些挑战，该框架通常适用于实验科学及其他领域。使用该系统进行实验的五年管理已经产生了材料实验和分析数据库（MEAD），其中包含来自数百万种材料合成和表征实验的原始数据和元数据，以及通过随附的开放源代码存储库中的软件将数据分析和提炼为性能和性能指标。通过研究人员和数据处理软件生成的实验和分析属性，可以搜索到前所未有的数量和多样性的实验数据。搜索网络界面允许用户可视化他们的搜索结果，并下载带有完整谱系注释的压缩数据包。数据的庞大性为将数据科学整合到物理科学中提供了巨大的挑战和机遇，而MEAD的数据和算法管理框架将促进自动化和自主发现在材料和化学研究中的进一步整合。

更新日期：2019-11-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>