当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using machine learning to improve ensemble docking for drug discovery.
Proteins: Structure, Function, and Bioinformatics ( IF 3.2 ) Pub Date : 2020-05-13 , DOI: 10.1002/prot.25899
Tanay Chandak 1 , John P Mayginnes 1 , Howard Mayes 1 , Chung F Wong 1
Affiliation  

Ensemble docking has provided an inexpensive method to account for receptor flexibility in molecular docking for virtual screening. Unfortunately, as there is no rigorous theory to connect the docking scores from multiple structures to measured activity, researchers have not yet come up with effective ways to use these scores to classify compounds into actives and inactives. This shortcoming has led to the decrease, rather than an increase in the performance of classifying compounds when more structures are added to the ensemble. Previously, we suggested machine learning, implemented in the form of a naïve Bayesian model could alleviate this problem. However, the naïve Bayesian model assumed that the probabilities of observing the docking scores to different structures to be independent. This approximation might prevent it from achieving even higher performance. In the work presented in this paper, we have relaxed this approximation when using several other machine learning methods—k nearest neighbor, logistic regression, support vector machine, and random forest—to improve ensemble docking. We found significant improvement.

中文翻译:

使用机器学习来改善药物发现的整体对接。

整体对接提供了一种廉价的方法来解决分子对接虚拟筛选中受体的灵活性。不幸的是,由于没有严格的理论将多个结构的对接分数与测得的活性联系起来,因此研究人员尚未找到有效的方法来使用这些分数将化合物分为活性物质和非活性物质。当将更多的结构添加到集合中时,此缺点导致分类的性能下降,而不是增加。以前,我们建议以朴素贝叶斯模型的形式实施的机器学习可以缓解此问题。但是,朴素的贝叶斯模型假设观察到不同结构的对接分数的概率是独立的。这种近似可能会阻止它获得更高的性能。在本文提出的工作中,当使用其他几种机器学习方法(k最近邻,逻辑回归,支持向量机和随机森林)来改善整体对接时,我们放宽了这种近似。我们发现了重大改进。
更新日期:2020-05-13
down
wechat
bug