Impact of non-normal error distributions on the benchmarking and ranking of quantum machine learning models,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Impact of non-normal error distributions on the benchmarking and ranking of quantum machine learning models
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2020-08-20 , DOI: 10.1088/2632-2153/aba184
Pascal Pernot ₁ , Bing Huang ₂ , Andreas Savin ₃

Affiliation

Quantum machine learning models have been gaining significant traction within atomistic simulation communities. Conventionally, relative model performances are being assessed and compared using learning curves (prediction error vs. training set size). This article illustrates the limitations of using the Mean Absolute Error (MAE) for benchmarking, which is particularly relevant in the case of non-normal error distributions. We analyze more specifically the prediction error distribution of the kernel ridge regression with SLATM representation and L ₂ distance metric (KRR-SLATM-L2) for effective atomization energies of QM7b molecules calculated at the level of theory CCSD(T)/cc-pVDZ. Error distributions of HF and MP2 at the same basis set referenced to CCSD(T) values were also assessed and compared to the KRR model. We show that the true performance of the KRR-SLATM-L2 method over the QM7b dataset is poorly assessed by the Mean Absolute Error, and can be notably improved after adaptation of the learning set.

中文翻译：

非正态误差分布对量子机器学习模型的基准和排名的影响

在原子模拟社区中，量子机器学习模型已经获得了广泛的关注。通常，使用学习曲线（预测误差与训练集大小）来评估和比较相对模型的性能。本文说明了使用平均绝对误差（MAE）进行基准测试的局限性，这在非正态误差分布的情况下尤其重要。我们更具体地分析SLATM表示和L ₂的内核岭回归的预测误差分布QM7b分子的有效雾化能的距离距离度量（KRR-SLATM-L2），其计算值为CCSD（T）/ cc-pVDZ理论值。还评估了参考CCSD（T）值的相同基集下的HF和MP2的误差分布，并将其与KRR模型进行了比较。我们显示，在QM7b数据集上，KRR-SLATM-L2方法的真实性能很难通过平均绝对误差评估，并且在适应学习集后可以得到显着改善。

更新日期：2020-08-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文