当前位置: X-MOL 学术npj Comput. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reliable and explainable machine-learning methods for accelerated material discovery
npj Computational Materials ( IF 9.7 ) Pub Date : 2019-11-14 , DOI: 10.1038/s41524-019-0248-2
Bhavya Kailkhura , Brian Gallagher , Sookyung Kim , Anna Hiszpanski , T. Yong-Jin Han

Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.



中文翻译:

可靠且可解释的机器学习方法,可加快材料发现速度

尽管ML在商业应用中表现出色,但将ML应用于材料科学应用时仍存在一些独特的挑战。在这种情况下,这项工作的贡献是双重的。首先,当从代表性不足/失衡的物料数据中学习时,我们确定了现有机器学习技术的常见陷阱。具体而言,我们表明,在数据不平衡的情况下,用于评估ML模型质量的标准方法会崩溃,并导致产生误导性的结论。此外,我们发现模型自身的置信度得分不可信,模型自省方法(使用更简单的模型)也无济于事,因为它们会导致预测性能下降(可靠性与可解释性之间的取舍)。其次,为了克服这些挑战,我们提出了一个通用的可解释且可靠的机器学习框架。具体来说,我们提出了一个通用管道,该管道采用较简单的模型组合来可靠地预测材料属性。我们还提出了一种转移学习技术,并表明可以通过利用不同材料特性之间的相关性来解决由于模型简单而导致的性能损失。还提出了一种新的评估指标和一个信任分数,以更好地量化预测中的置信度。为了提高可解释性,我们在框架中添加了一个基本原理生成器组件,该组件提供了模型级别和决策级别的解释。最后,我们证明了我们的技术在两种应用中的多功能性:(1)预测晶体化合物的特性;(2)确定潜在稳定的太阳能电池材料。

更新日期:2019-11-14
down
wechat
bug