当前位置: X-MOL 学术ACS Comb. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bias-Free Chemically Diverse Test Sets from Machine Learning
ACS Combinatorial Science ( IF 3.903 ) Pub Date : 2017-07-27 00:00:00 , DOI: 10.1021/acscombsci.7b00087
Ellen T. Swann 1 , Michael Fernandez 1 , Michelle L. Coote 2 , Amanda S. Barnard 1
Affiliation  

Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist’s intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.

中文翻译:

来自机器学习的无偏差化学多样化测试仪

量子化学中当前的基准测试方法依赖于使用化学家的直觉建立的数据库。尚未完全了解这些数据库的多样性或代表性。多元统计技术,例如原型分析和K-均值聚类以前曾被用来总结大量的纳米粒子,但是分子更加多样化并且不容易用描述符来表征。在这项工作中,我们基于分子的一维,二维和三维结构比较了三组描述符。使用来自NIST计算化学比较和基准数据库的数据以及机器学习技术,我们证明了这些结构描述符与分子电子能之间的功能关系。带有拓扑或库仑矩阵描述符的原型和原型可用于识别较小的,具有统计意义的测试集,以更好地捕获化学空间的多样性。我们采用相同的方法来发现有机分子的不同子集,以演示如何轻松地将这些方法重新应用于单独的研究项目。最后,我们使用无偏差测试集来评估密度泛函理论和量子蒙特卡洛方法的性能。
更新日期:2017-07-28
down
wechat
bug