Automation of some macromolecular properties using a machine learning approach,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automation of some macromolecular properties using a machine learning approach
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2021-06-16 , DOI: 10.1088/2632-2153/abe7b6
Merjem Hoxha , Hiqmet Kamberaj

In this study, we employed a newly developed method to predict macromolecular properties using a swarm artificial neural network (ANN) method as a machine learning approach. In this method, the molecular structures are represented by the feature description vectors used as training input data for a neural network. This study aims to develop an efficient approach for training an ANN using either experimental or quantum mechanics data. We aim to introduce an error model controlling the reliability of the prediction confidence interval using a bootstrapping swarm approach. We created different datasets of selected experimental or quantum mechanics results. Using this optimized ANN, we hope to predict properties and their statistical errors for new molecules. There are four datasets used in this study. That includes the dataset of 642 small organic molecules with known experimental hydration free energies, the dataset of 1475 experimental pKa values of ionizable groups in 192 proteins, the dataset of 2693 mutants in 14 proteins with given experimental values of changes in the Gibbs free energy, and a dataset of 7101 quantum mechanics heat of formation calculations. All the data are prepared and optimized using the AMBER force field in the CHARMM macromolecular computer simulation program. The bootstrapping swarm ANN code for performing the optimization and prediction is written in Python computer programming language. The descriptor vectors of the small molecules are based on the Coulomb matrix and sum over bond properties. For the macromolecular systems, they consider the chemical-physical fingerprints of the region in the vicinity of each amino acid.

中文翻译：

使用机器学习方法自动化一些大分子特性

在这项研究中，我们采用了一种新开发的方法来使用群人工神经网络 (ANN) 方法作为机器学习方法来预测大分子特性。在该方法中，分子结构由用作神经网络训练输入数据的特征描述向量表示。本研究旨在开发一种使用实验或量子力学数据训练 ANN 的有效方法。我们的目标是引入一个误差模型，使用引导群方法控制预测置信区间的可靠性。我们为选定的实验或量子力学结果创建了不同的数据集。使用这种优化的人工神经网络，我们希望预测新分子的特性及其统计误差。本研究使用了四个数据集。这包括具有已知实验水合自由能的 642 个小有机分子的数据集，192 种蛋白质中可电离基团的 1475 个实验 pKa 值的数据集，14 种蛋白质中具有给定吉布斯自由能变化实验值的 2693 个突变体的数据集，以及 7101 量子力学形成热计算的数据集。所有数据均使用 CHARMM 大分子计算机模拟程序中的 AMBER 力场进行准备和优化。用于执行优化和预测的 bootstrapping swarm ANN 代码是用 Python 计算机编程语言编写的。小分子的描述符向量基于库仑矩阵和键属性总和。对于大分子体系，

更新日期：2021-06-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文