Comment on “A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials” by Hattrick-Simpers, et al., Molecular Systems Design & Engineering, 2018, 3, 509,Molecular Systems Design & Engineering

当前位置： X-MOL 学术 › Mol. Syst. Des. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comment on “A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials” by Hattrick-Simpers, et al., Molecular Systems Design & Engineering, 2018, 3, 509
Molecular Systems Design & Engineering ( IF 3.6 ) Pub Date : 2020/02/07 , DOI: 10.1039/c9me00138g
Jason Hattrick-Simpers _{1,

2,

3} , Brian DeCost _{1,

2,

3}

Affiliation

In this short comment we present a reproducibility study for our recent manuscript “A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials” by Hattrick-Simpers, et al., Mol. Syst. Des. Eng., 2018, 3, 509” using a suite of open source materials data science tools. The principal goal of this study is to provide the interested reader the ability to reproduce our previous machine learning model with minimal effort and then perform predictions upon the holdout set used in that manuscript. In transcribing our model from the Java-based Magpie/Weka framework to the Python-based Matminer/scikit-learn framework we noticed an unexpected discrepancy in the predictions between the two platforms. To compare the performance of nominally equivalent random forest regression models across these two platforms, we trained and evaluated 50 replicate models for each platform using random 90% subsets of the full hydride training set for each replicate. The Magpie/Weka models showed somewhat higher predicted mean absolute error (5.6 ± 0.4) than the Matminer/scikit-learn models (4.2 ± 0.4) on the holdout set, although the validation statistics were within error of one another. It is beyond the scope of this comment to fully analyze the ultimate source of the variance in these predictions, but we speculate that some contribution results from differences in how Magpie treats duplicate compositions in the training set and/or differences in RF implementation between Weka and scikit-learn.

中文翻译：

Hattrick-Simpers等人的“用于预测高压氢压缩机材料的简单约束机器学习模型”评论，分子系统设计与工程，2018，3，509

在这篇简短的评论中，我们为Hattrick-Simpers等人的最新手稿“预测高压氢压缩机材料的简单约束机器学习模型”提供了可重复性研究。，Mol。Syst。德斯。，2018，3，509”采用了一套开源的材料数据科学工具。这项研究的主要目的是使感兴趣的读者能够以最小的努力重现我们之前的机器学习模型，然后根据该手稿中使用的保留集进行预测。将我们的模型从基于Java的Magpie / Weka框架转录到基于Python的Matminer / scikit-learn框架，我们注意到两个平台之间的预测存在出乎意料的差异。为了比较在这两个平台上名义上等效的随机森林回归模型的性能，我们针对每个平台使用完整氢化物训练集的随机90％子集，针对每个平台训练和评估了50个副本模型。在喜鹊/ Weka的模型显示稍高的预测比平均绝对误差（5.6±0.4）Matminer / scikit学习模型（4.2±0.4）在保持集上，尽管验证统计数据彼此之间误差不大。在这些预测中全面分析差异的最终根源超出了本评论的范围，但是我们推测，某些贡献是由于Mag对训练集中重复成分的处理方式不同和/或Weka和scikit学习。

更新日期：2020-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>