当前位置: X-MOL 学术Integr. Mater. Manuf. Innov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmark AFLOW Data Sets for Machine Learning
Integrating Materials and Manufacturing Innovation ( IF 2.4 ) Pub Date : 2020-05-27 , DOI: 10.1007/s40192-020-00174-4
Conrad L. Clement , Steven K. Kauwe , Taylor D. Sparks

Materials informatics is increasingly finding ways to exploit machine learning algorithms. Techniques such as decision trees, ensemble methods, support vector machines, and a variety of neural network architectures are used to predict likely material characteristics and property values. Supplemented with laboratory synthesis, applications of machine learning to compound discovery and characterization represent one of the most promising research directions in materials informatics. A shortcoming of this trend, in its current form, is a lack of standardized materials data sets on which to train, validate, and test model effectiveness. Applied machine learning research depends on benchmark data to make sense of its results. Fixed, predetermined data sets allow for rigorous model assessment and comparison. Machine learning publications that do not refer to benchmarks are often hard to contextualize and reproduce. In this data descriptor article, we present a collection of data sets of different material properties taken from the AFLOW database. We describe them, the procedures that generated them, and their use as potential benchmarks. We provide a compressed ZIP file containing the data sets and a GitHub repository of associated Python code. Finally, we discuss opportunities for future work incorporating the data sets and creating similar benchmark collections.

中文翻译:

用于机器学习的基准AFLOW数据集

材料信息学越来越多地找到利用机器学习算法的方法。决策树,集成方法,支持向量机和各种神经网络体系结构等技术可用于预测可能的材料特性和属性值。作为实验室综合的补充,机器学习在化合物发现和表征中的应用代表了材料信息学中最有前途的研究方向之一。目前这种趋势的一个缺点是缺乏用于训练,验证和测试模型有效性的标准化材料数据集。应用机器学习研究依靠基准数据来理解其结果。固定的预定数据集可进行严格的模型评估和比较。不参考基准的机器学习出版物通常很难上下文化和复制。在此数据描述符文章中,我们提供了从AFLOW数据库中获取的不同材料属性的数据集的集合。我们描述了它们,生成它们的过程以及它们作为潜在基准的用途。我们提供了一个压缩的ZIP文件,其中包含数据集和相关Python代码的GitHub存储库。最后,我们讨论了合并数据集并创建类似基准集合的未来工作机会。我们提供了一个压缩的ZIP文件,其中包含数据集和相关Python代码的GitHub存储库。最后,我们讨论了合并数据集并创建类似基准集合的未来工作机会。我们提供了一个压缩的ZIP文件,其中包含数据集和相关Python代码的GitHub存储库。最后,我们讨论了合并数据集并创建类似基准集合的未来工作机会。
更新日期:2020-05-27
down
wechat
bug