The impact of compound library size on the performance of scoring functions for structure-based virtual screening.,Briefings in Bioinformatics

当前位置： X-MOL 学术 › Brief. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The impact of compound library size on the performance of scoring functions for structure-based virtual screening.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-06-22 , DOI: 10.1093/bib/bbaa095
Louison Fresnais , Pedro J Ballester

Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.

中文翻译：

化合物库大小对基于结构的虚拟筛选评分函数性能的影响。

更大的训练数据集已被证明可以提高基于机器学习 (ML) 的评分函数 (SF) 用于基于结构的虚拟筛选 (SBVS) 的准确性。此外，被称为超大型化合物库的 SBVS 的大量测试集已被证明能够快速发现具有低纳摩尔效力的选择性药物先导物。该概念验证是使用单个对接工具及其 SF 在两个目标上进行的。因此，尚不清楚这种高水平的性能是否会推广到其他目标、对接工具和 SF。我们发现，使用不同的对接工具及其经典 SF，筛选更大的化合物库会导致在所有六个额外目标中识别出更有效的活性物质。此外，我们确定进一步提高检索到的分子效力的方法是使用更准确的基于 ML 的 SF 对它们进行排序（我们发现这在六个目标中的四个中是正确的；在其余两个目标中差异不显着） . 基于 ML 的 SF 还实现了跨目标的平均命中率提高了 3 倍。最后，我们观察到经典和基于 ML 的 SFs 通常会发现不同的活性，这支持在这些目标上使用两种类型的 SFs。

更新日期：2020-06-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11