Small data materials design with machine learning: When the average model knows best,Journal of Applied Physics

当前位置： X-MOL 学术 › J. Appl. Phys. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Small data materials design with machine learning: When the average model knows best
Journal of Applied Physics ( IF 3.2 ) Pub Date : 2020-08-07 , DOI: 10.1063/5.0012285
Danny E. P. Vanpoucke _{1,

2} , Onno S. J. van Knippenberg ₃ , Ko Hermans ₃ , Katrien V. Bernaerts ₁ , Siamak Mehrkanoon ₄

Affiliation

Machine learning is quickly becoming an important tool in modern materials design. Where many of its successes are rooted in huge datasets, the most common applications in academic and industrial materials design deal with datasets of at best a few tens of data points. Harnessing the power of machine learning in this context is, therefore, of considerable importance. In this work, we investigate the intricacies introduced by these small datasets. We show that individual data points introduce a significant chance factor in both model training and quality measurement. This chance factor can be mitigated by the introduction of an ensemble-averaged model. This model presents the highest accuracy, while at the same time, it is robust with regard to changing the dataset size. Furthermore, as only a single model instance needs to be stored and evaluated, it provides a highly efficient model for prediction purposes, ideally suited for the practical materials scientist.

中文翻译：

使用机器学习设计小数据材料：当平均模型最了解时

机器学习正迅速成为现代材料设计的重要工具。它的许多成功都源于庞大的数据集，而学术和工业材料设计中最常见的应用处理最多只有几十个数据点的数据集。因此，在这种情况下利用机器学习的力量非常重要。在这项工作中，我们调查了这些小数据集引入的复杂性。我们表明，单个数据点在模型训练和质量测量中都引入了重要的机会因素。通过引入集成平均模型可以减轻这种机会因素。该模型具有最高的准确性，同时，它在更改数据集大小方面具有鲁棒性。此外，由于只需要存储和评估单个模型实例，

更新日期：2020-08-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>