当前位置: X-MOL 学术Acc. Chem. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiscale Modeling at the Interface of Molecular Mechanics and Natural Language through Attention Neural Networks
Accounts of Chemical Research ( IF 18.3 ) Pub Date : 2022-11-15 , DOI: 10.1021/acs.accounts.2c00330
Markus J Buehler 1, 2
Affiliation  

Humans are continually bombarded with massive amounts of data. To deal with this influx of information, we use the concept of attention in order to perceive the most relevant input from vision, hearing, touch, and others. Thereby, the complex ensemble of signals is used to generate output by querying the processed data in appropriate ways. Attention is also the hallmark of the development of scientific theories, where we elucidate which parts of a problem are critical, often expressed through differential equations. In this Account we review the emergence of attention-based neural networks as a class of approaches that offer many opportunities to describe materials across scales and modalities, including how universal building blocks interact to yield a set of material properties. In fact, the self-assembly of hierarchical, structurally complex, and multifunctional biomaterials remains a grand challenge in modeling, theory, and experiment. Expanding from the process by which material building blocks physically interact to form a type of material, in this Account we view self-assembly as both the functional emergence of properties from interacting building blocks as well as the physical process by which elementary building blocks interact and yield structure and, thereby, functions. This perspective, integrated through the theory of materiomics, allows us to solve multiscale problems with a first-principles-based computational approach based on attention-based neural networks that transform information to feature to property while providing a flexible modeling approach that can integrate theory, simulation, and experiment. Since these models are based on a natural language framework, they offer various benefits including incorporation of general domain knowledge via general-purpose pretraining, which can be accomplished without labeled data or large amounts of lower-quality data. Pretrained models then offer a general-purpose platform that can be fine-tuned to adapt these models to make specific predictions, often with relatively little labeled data. The transferrable power of the language-based modeling approach realizes a neural olog description, where mathematical categorization is learned by multiheaded attention, without domain knowledge in its formulation. It can hence be applied to a range of complex modeling tasks─such as physical field predictions, molecular properties, or structure predictions, all using an identical formulation. This offers a complementary modeling approach that is already finding numerous applications, with great potential to solve complex assembly problems, enabling us to learn, build, and utilize functional categorization of how building blocks yield a range of material functions. In this Account, we demonstrate the approach in various application areas, including protein secondary structure prediction and prediction of normal-mode frequencies as well as predicting mechanical fields near cracks. Unifying these diverse problem areas is the building block approach, where the models are based on a universally applicable platform that offers benefits ranging from transferability, interpretability, and cross-domain pollination of knowledge as exemplified through a transformer model applied to predict how musical compositions infer de novo protein structures. We discuss future potentialities of this approach for a variety of material phenomena across scales, including the use in multiparadigm modeling schemes.

中文翻译:

通过注意力神经网络在分子力学和自然语言的界面上进行多尺度建模

人类不断受到海量数据的轰炸。为了处理这些信息的涌入,我们使用注意力的概念来感知来自视觉、听觉、触觉和其他方面的最相关输入。因此,复杂的信号集合用于通过以适当的方式查询处理后的数据来生成输出。注意力也是科学理论发展的标志,我们在其中阐明问题的哪些部分是关键的,通常通过微分方程来表达。在这篇文章中,我们回顾了基于注意力的神经网络的出现,作为一类方法,它提供了许多跨尺度和模式描述材料的机会,包括通用构建块如何相互作用以产生一组材料特性。事实上,层次分明、结构复杂的自组装,多功能生物材料在建模、理论和实验方面仍然是一个巨大的挑战。从材料构建块物理相互作用形成一种材料的过程扩展,在这个帐户中,我们将自组装视为相互作用的构建块属性的功能出现以及基本构建块相互作用的物理过程和产生结构,从而产生功能。这种通过材料组学理论整合的观点使我们能够使用基于注意力的神经网络的基于第一性原理的计算方法解决多尺度问题,该方法将信息转换为特征到属性,同时提供可以整合理论的灵活建模方法,模拟和实验。由于这些模型基于自然语言框架,它们提供各种好处,包括通过通用预训练合并通用领域知识,这可以在没有标记数据或大量低质量数据的情况下完成。然后,预训练模型提供了一个通用平台,可以对其进行微调以适应这些模型以进行特定预测,通常使用相对较少的标记数据。基于语言的建模方法的可转移能力实现了神经逻辑描述,其中数学分类是通过多头注意力学习的,其公式中没有领域知识。因此,它可以应用于一系列复杂的建模任务——例如物理场预测、分子特性或结构预测,所有这些都使用相同的公式。这提供了一种互补的建模方法,该方法已经找到了许多应用程序,具有解决复杂装配问题的巨大潜力,使我们能够学习、构建和利用构建块如何产生一系列材料功能的功能分类。在此帐户中,我们展示了各种应用领域的方法,包括蛋白质二级结构预测和正常模式频率预测以及预测裂纹附近的机械场。统一这些不同的问题领域是构建块方法,其中模型基于一个普遍适用的平台,该平台提供的好处包括可转移性、可解释性、知识的跨领域授粉,例如通过应用于预测音乐作品如何从头推断蛋白质结构的变压器模型。我们讨论了这种方法在跨尺度的各种材料现象中的未来潜力,包括在多范式建模方案中的使用。
更新日期:2022-11-15
down
wechat
bug