Statistical approach for automated weighting of datasets: Application to heat capacity data,Calphad

当前位置： X-MOL 学术 › Calphad › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Statistical approach for automated weighting of datasets: Application to heat capacity data
Calphad ( IF 1.9 ) Pub Date : 2020-08-14 , DOI: 10.1016/j.calphad.2020.101994
S. Zomorodpoosh , B. Bocklund , A. Obaied , R. Otis , Z.-K. Liu , I. Roslyakova

An essential step in CALPHAD is assigning relative weights to different datasets, but there is no consensus as to the best approach regarding this issue. Currently, such an assignment of weights for experimental or first-principles data is performed manually based on the knowledge and experience of the modeler. Since the existing manual treatment is subjective and time consuming, manipulation of such data is rapidly advancing toward automated procedures through statistical and data mining tools. In the present study, we propose an automated approach to determine the weight of datasets based on the K-Fold Cross-Validation method, modified under the conditions that each fold is selected non-randomly and contains an unequal number of observations. This approach can be considered for researchers as a support tool to evaluate the reliability of each dataset involved in the CALPHAD modeling and quantify the impact of weighting by statistical analysis of the corresponding model. We demonstrate the efficacy of this method through the evaluation of heat capacity data of fcc nickel, hcp magnesium, and bcc iron.

中文翻译：

数据集自动加权的统计方法：应用于热容量数据

CALPHAD的基本步骤是为不同的数据集分配相对权重，但是关于此问题的最佳方法尚无共识。当前，基于建模者的知识和经验，手动执行针对实验或第一原理数据的权重分配。由于现有的手动处理是主观且耗时的，因此通过统计和数据挖掘工具对此类数据的处理正迅速朝着自动化程序发展。在本研究中，我们提出了一种自动方法，该方法基于K折交叉验证方法确定数据集的权重，该方法在非随机选择每个折并且包含不相等观察值的条件下进行了修改。研究人员可以将此方法视为支持工具，以评估CALPHAD建模中涉及的每个数据集的可靠性，并通过对相应模型的统计分析来量化加权的影响。我们通过评估fcc镍，hcp镁和bcc铁的热容数据证明了该方法的有效性。

更新日期：2020-08-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文