当前位置: X-MOL 学术Mol. Syst. Des. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of spatially resolved molecular fingerprints for machine learning applications and development of a codebase for their implementation
Molecular Systems Design & Engineering ( IF 3.2 ) Pub Date : 2018-02-20 00:00:00 , DOI: 10.1039/c8me00003d
Mardochee Reveil 1, 2, 3, 4 , Paulette Clancy 1, 2, 3, 4
Affiliation  

Direct mapping between material structures and properties for various classes of materials is often the ultimate goal of materials researchers. Recent progress in the field of machine learning has created a unique path to develop such mappings based on empirical data. This new opportunity warranted the need for the development of advanced structural representations suitable for use with current machine learning algorithms. A number of such representations termed “molecular fingerprints” or descriptors have been proposed over the years for this purpose. In this paper, we introduce a classification framework to better explain and interpret existing fingerprinting schemes in the literature, with a focus on those with spatial resolution. We then present the implementation of SEING, a new codebase to computing those fingerprints, and we demonstrate its capabilities by building k-nearest neighbor (k-NN) models for force prediction that achieve a generalization accuracy of 0.1 meV Å−1 and an R2 score as high as 0.99 at testing. Our results indicate that simple and generally overlooked k-NN models could be very promising compared to approaches such as neural networks, Gaussian processes, and support vector machines, which are more commonly used for machine learning-based predictions in computational materials science.

中文翻译:

用于机器学习应用程序的空间分辨分子指纹的分类以及用于实现的代码库的开发

各种材料类别的材料结构和特性之间的直接映射通常是材料研究人员的最终目标。机器学习领域的最新进展为基于经验数据开发此类映射开辟了一条独特的道路。这个新机会保证了需要开发适合与当前机器学习算法一起使用的高级结构表示形式。多年来,为此目的已经提出了许多这样的表示形式,称为“分子指纹”或描述符。在本文中,我们介绍了一个分类框架,以更好地解释和解释文献中现有的指纹识别方案,重点是那些具有空间分辨率的方案。然后,我们介绍SEING的实现,SEING是一个用于计算这些指纹的新代码库,用于力预测的k近邻(k -NN)模型在测试时实现了0.1 meVÅ -1的泛化精度和高达0.99的R 2评分。我们的结果表明,与诸如神经网络,高斯过程和支持向量机之类的方法相比,简单且通常被忽略的k -NN模型可能非常有前途,而神经网络,高斯过程和支持向量机等方法通常用于计算材料科学中基于机器学习的预测。
更新日期:2018-02-20
down
wechat
bug