当前位置: X-MOL 学术npj Comput. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition
npj Computational Materials ( IF 9.7 ) Pub Date : 2019-11-18 , DOI: 10.1038/s41524-019-0239-3
Christopher Sutton , Luca M. Ghiringhelli , Takenori Yamamoto , Yury Lysogorskiy , Lars Blumenthal , Thomas Hammerschmidt , Jacek R. Golebiowski , Xiangyue Liu , Angelo Ziletti , Matthias Scheffler

A public data-analytics competition was organized by the Novel Materials Discovery (NOMAD) Centre of Excellence and hosted by the online platform Kaggle by using a dataset of 3,000 (AlxGayIn1–xy)2O3 compounds. Its aim was to identify the best machine-learning (ML) model for the prediction of two key physical properties that are relevant for optoelectronic applications: the electronic bandgap energy and the crystalline formation energy. Here, we present a summary of the top-three ranked ML approaches. The first-place solution was based on a crystal-graph representation that is novel for the ML of properties of materials. The second-place model combined many candidate descriptors from a set of compositional, atomic-environment-based, and average structural properties with the light gradient-boosting machine regression model. The third-place model employed the smooth overlap of atomic position representation with a neural network. The Pearson correlation among the prediction errors of nine ML models (obtained by combining the top-three ranked representations with all three employed regression models) was examined by using the Pearson correlation to gain insight into whether the representation or the regression model determines the overall model performance. Ensembling relatively decorrelated models (based on the Pearson correlation) leads to an even higher prediction accuracy.



中文翻译:

NOMAD 2018 Kaggle竞赛向人群采购材料科学挑战

新型材料发现(NOMAD)卓越中心组织了一次公共数据分析比赛,并由在线平台Kaggle通过使用3,000(Al x Ga y In 1- x - y2 O 3的数据集举办。化合物。其目的是确定最佳的机器学习(ML)模型,以预测与光电应用相关的两个关键物理特性:电子带隙能和晶体形成能。在这里,我们总结了排名前三的机器学习方法。首要解决方案基于晶体图表示,对于材料的ML来说是新颖的。第二位模型将来自一组基于原子环境的,平均结构特性的许多候选描述符与增强光梯度的机器回归模型相结合。第三名模型采用原子位置表示与神经网络的平滑重叠。通过使用Pearson相关性检查了9个ML模型(通过将排名前三的表示与所有三个使用的回归模型相结合而获得)的预测误差之间的Pearson相关性,以了解表示或回归模型是否确定了整体模型表现。组装相​​对去相关的模型(基于Pearson相关性)会导致更高的预测准确性。

更新日期:2019-11-18
down
wechat
bug