当前位置: X-MOL 学术Chemometr. Intell. Lab. Systems › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quantitative structure-activity relationship (QSAR) models and their applicability domain analysis on HIV-1 protease inhibitors by machine learning methods
Chemometrics and Intelligent Laboratory Systems ( IF 3.7 ) Pub Date : 2020-01-01 , DOI: 10.1016/j.chemolab.2019.103888
Yujia Tian , Shengde Zhang , Hongyan Yin , Aixia Yan

Abstract HIV-1 protease inhibitors (PIs) make a vital contribution on highly active antiretroviral therapy (HAART) of human immunodeficiency virus (HIV). In this study, 14 quantitative structure-activity relationship (QSAR) models on 1238 PIs were built by four machine learning methods, including multiple linear regression (MLR), support vector machine (SVM), random forest (RF) and deep neural networks (DNlN). For the best model Model2G constructed by DNN algorithm, the coefficient of determination (R2) of 0.88 and 0.79, the root mean squared error (RMSE) of 0.39 and 0.51 were obtained on training set and test set, respectively. For model Model2G, the applicability domain threshold (ADT) of 1.765 was obtained for training set, a compound that has a similarity distance (d) less than the ADT is considered to be inside the applicability domain, could be predicted accurately, and thus 65.37% compounds in test set performed reliable. In addition, the 1238 PIs were manually divided into eight subsets containing different scaffolds. It was found that hydroxylamine derivatives and seven-member cyclic urea derivatives showed highly inhibitory activity comparing with other subsets. We also built QSAR models with SVM, RF and DNN methods on two subsets of 299 hydroxylamine derivatives inhibitors (Dataset2) and 377 seven-member cyclic urea derivatives inhibitors (Dataset3). For the best model Model3A on Dataset2, R2 of 0.71 and RMSE of 0.53 were obtained for test set. For the best model Model4B on Dataset3, R2 of 0.82 and RMSE of 0.51 were obtained for test set. At last, we analyzed the descriptors which make significant contributions on the bioactivity of inhibitors among these two subsets. It was found that highly active inhibitors of seven-member cyclic urea derivatives usually contained several aromatic nitrogen heterocyclic ring substituents such as the inidazole and the pyrazole. The oxazolidinone group and sulfanilamide mainly appeared in highly active inhibitors of hydroxylamine derivatives. These observations may be utilized further in designing promising HIV-1 protease inhibitors.

中文翻译:

通过机器学习方法对 HIV-1 蛋白酶抑制剂进行定量构效关系 (QSAR) 模型及其适用域分析

摘要 HIV-1 蛋白酶抑制剂 (PI) 对人类免疫缺陷病毒 (HIV) 的高效抗逆转录病毒疗法 (HAART) 做出了重要贡献。本研究通过多元线性回归(MLR)、支持向量机(SVM)、随机森林(RF)和深度神经网络等四种机器学习方法,在1238个PI上建立了14个定量构效关系(QSAR)模型。 DN)。对于由DNN算法构建的最佳模型Model2G,在训练集和测试集上分别获得了0.88和0.79的决定系数(R2),均方根误差(RMSE)分别为0.39和0.51。对于模型 Model2G,训练集的适用域阈值 (ADT) 为 1.765,相似距离 (d) 小于 ADT 的化合物被认为在适用域内,可以准确预测,因此测试集中 65.37% 的化合物表现可靠。此外,将 1238 个 PI 手动分为包含不同支架的八个子集。发现羟胺衍生物和七元环脲衍生物与其他亚群相比显示出高度的抑制活性。我们还在 299 种羟胺衍生物抑制剂 (Dataset2) 和 377 种七元环脲衍生物抑制剂 (Dataset3) 的两个子集上使用 SVM、RF 和 DNN 方法构建了 QSAR 模型。对于 Dataset2 上的最佳模型 Model3A,测试集的 R2 为 0.71,RMSE 为 0.53。对于 Dataset3 上的最佳模型 Model4B,测试集的 R2 为 0.82,RMSE 为 0.51。最后,我们分析了这两个亚群中对抑制剂生物活性有显着贡献的描述符。发现七元环脲衍生物的高活性抑制剂通常含有几个芳族氮杂环取代基,如硝唑和吡唑。恶唑烷酮基团和磺胺主要出现在羟胺衍生物的高活性抑制剂中。这些观察结果可进一步用于设计有前景的 HIV-1 蛋白酶抑制剂。
更新日期:2020-01-01
down
wechat
bug