当前位置: X-MOL 学术Mach. Learn. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The role of feature space in atomistic learning
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2021-04-15 , DOI: 10.1088/2632-2153/abdaf7
Alexander Goscinski , Guillaume Fraux , Giulio Imbalzano , Michele Ceriotti

Efficient, physically-inspired descriptors of the structure and composition of molecules and materials play a key role in the application of machine-learning techniques to atomistic simulations. The proliferation of approaches, as well as the fact that each choice of features can lead to very different behavior depending on how they are used, e.g. by introducing non-linear kernels and non-Euclidean metrics to manipulate them, makes it difficult to objectively compare different methods, and to address fundamental questions on how one feature space is related to another. In this work we introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels, in terms of the structure of the feature space that they induce. We define diagnostic tools to determine whether alternative feature spaces contain equivalent amounts of information, and whether the common information is substantially distorted when going from one feature space to another. We compare, in particular, representations that are built in terms of n-body correlations of the atom density, quantitatively assessing the information loss associated with the use of low-order features. We also investigate the impact of different choices of basis functions and hyperparameters of the widely used SOAP and Behler–Parrinello features, and investigate how the use of non-linear kernels, and of a Wasserstein-type metric, change the structure of the feature space in comparison to a simpler linear feature space.



中文翻译:

特征空间在原子学习中的作用

分子和材料的结构和组成的有效的、受物理启发的描述符在机器学习技术在原子模拟中的应用中起着关键作用。方法的激增,以及每个特征的选择都可能导致非常不同的行为,这取决于它们的使用方式,例如通过引入非线性内核和非欧几里德度量来操纵它们,这使得客观比较变得困难不同的方法,并解决一个特征空间如何与另一个特征空间相关的基本问题。在这项工作中,我们引入了一个框架来比较不同的描述符集,以及通过度量和内核对它们进行转换的不同方法,就它们所引起的特征空间的结构而言。我们定义了诊断工具来确定替代特征空间是否包含等量的信息,以及从一个特征空间到另一个特征空间时公共信息是否严重失真。我们特别比较了建立在以下方面的表示原子密度的n体相关性,定量评估与使用低阶特征相关的信息损失。我们还研究了广泛使用的 SOAP 和 Behler-Parrinello 特征的基函数和超参数的不同选择的影响,并研究了非线性内核和 Wasserstein 型度量的使用如何改变特征空间的结构与更简单的线性特征空间相比。

更新日期:2021-04-15
down
wechat
bug