当前位置: X-MOL 学术J. Chem. Theory Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Formulation of Small Test Sets Using Large Test Sets for Efficient Assessment of Quantum Chemistry Methods
Journal of Chemical Theory and Computation ( IF 5.5 ) Pub Date : 2018-07-13 00:00:00 , DOI: 10.1021/acs.jctc.8b00514
Bun Chan 1
Affiliation  

In the present study, we have examined in detail literature data of deviations for a wide range of (mainly) DFT methods for the extensive MGCDB82 set (∼4400 data points) of main-group thermochemical quantities. We use the data and standard statistical techniques (lasso regularization and forward selection) to devise the MG8 model for linearly combining assessment results of a collection of small data sets to accurately estimate the MAD of MGCDB82. The MG8 model contains a total of 64 data points representing noncovalent interactions, isomerization energies, thermochemical properties, and barrier heights. It is thus well suited for rapid evaluation of new quantum chemistry procedures. We propose that a value of ∼4 kJ mol–1 for an estimated MAD by the MG8 model (EMADMG8) to be an initial indicator of a highly robust quantum chemistry method, with large deviations occurring mainly for properties (such as heats of formation) that are known to be difficult to accurately compute. For methods with larger EMADs, we emphasize the importance of more thorough testing, as these methods are likely to have a larger number of outliers, and it may be less trivial to anticipate circumstances under which large deviations occur. In relation to this aspect, we have applied the same generally applicable statistical techniques to further formulate small-data-set models for assessing the accuracy for some properties that are not covered by MG8 nor by MGCDB82. They include the MOR13 model for metal–organic reactions, the SBG5 model for semiconductor band gaps, and MB13 for stress-testing methods with artificial species.

中文翻译:

使用大型测试集制定小型测试集以有效评估量子化学方法

在本研究中,我们详细研究了广泛的(主要)DFT方法对主热化学量的大量MGCDB82集(约4400个数据点)的偏差的文献数据。我们使用数据和标准统计技术(套索正则化和正向选择)设计MG8模型,以线性组合小数据集集合的评估结果来准确估算MGCDB82的MAD。MG8模型包含总共64个数据点,分别代表非共价相互作用,异构化能,热化学性质和势垒高度。因此,它非常适合对新的量子化学程序进行快速评估。我们建议通过MG8模型(EMAD MG8)估算的MAD值约为4 kJ mol –1)是高度鲁棒的量子化学方法的初始指标,其中较大的偏差主要发生在已知难以精确计算的特性(例如形成热)上。对于具有较大EMAD的方法,我们强调进行更彻底测试的重要性,因为这些方法可能存在大量异常值,并且预计发生较大偏差的情况可能不太重要。关于这一方面,我们已经应用了相同的普遍适用的统计技术来进一步制定小数据集模型,以评估MG8或MGCDB82未涵盖的某些属性的准确性。它们包括用于金属-有机反应的MOR13模型,用于半导体带隙的SBG5模型以及用于使用人工物质进行压力测试的MB13模型。
更新日期:2018-07-13
down
wechat
bug