Beware of the generic machine learning-based scoring functions in structure-based virtual screening.,Briefings in Bioinformatics

当前位置： X-MOL 学术 › Brief. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Beware of the generic machine learning-based scoring functions in structure-based virtual screening.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-06-02 , DOI: 10.1093/bib/bbaa070
Chao Shen ₁ , Ye Hu ₁ , Zhe Wang ₁ , Xujun Zhang ₁ , Jinping Pang ₁ , Gaoang Wang ₁ , Haiyang Zhong ₁ , Lei Xu ₁ , Dongsheng Cao ₁ , Tingjun Hou ₁

Affiliation

Machine learning-based scoring functions (MLSFs) have attracted extensive attention recently and are expected to be potential rescoring tools for structure-based virtual screening (SBVS). However, a major concern nowadays is whether MLSFs trained for generic uses rather than a given target can consistently be applicable for VS. In this study, a systematic assessment was carried out to re-evaluate the effectiveness of 14 reported MLSFs in VS. Overall, most of these MLSFs could hardly achieve satisfactory results for any dataset, and they could even not outperform the baseline of classical SFs such as Glide SP. An exception was observed for RFscore-VS trained on the Directory of Useful Decoys-Enhanced dataset, which showed its superiority for most targets. However, in most cases, it clearly illustrated rather limited performance on the targets that were dissimilar to the proteins in the corresponding training sets. We also used the top three docking poses rather than the top one for rescoring and retrained the models with the updated versions of the training set, but only minor improvements were observed. Taken together, generic MLSFs may have poor generalization capabilities to be applicable for the real VS campaigns. Therefore, it should be quite cautious to use this type of methods for VS.

中文翻译：

注意基于结构的虚拟筛选中基于机器学习的通用评分函数。

基于机器学习的评分函数（MLSF）最近引起了广泛关注，并有望成为基于结构的虚拟筛选（SBVS）的潜在重新评分工具。然而，现在的一个主要问题是针对通用用途而不是给定目标进行训练的 MLSF 是否可以始终适用于 VS。在这项研究中，进行了系统评估以重新评估 14 种报告的 MLSF 在 VS 中的有效性。总体而言，这些 MLSF 中的大多数对于任何数据集都难以取得令人满意的结果，它们甚至无法超越 Glide SP 等经典 SF 的基线。在有用诱饵增强数据集目录上训练的 RFscore-VS 观察到一个例外，这显示了它对大多数目标的优越性。然而，在大多数情况下，它清楚地说明了与相应训练集中的蛋白质不同的目标的性能相当有限。我们还使用前三个对接姿势而不是前一个来重新评分并使用训练集的更新版本重新训练模型，但只观察到很小的改进。综上所述，通用 MLSF 可能具有较差的泛化能力，无法适用于真正的 VS 活动。因此，对VS使用这种方法应该相当谨慎。通用 MLSF 可能具有较差的泛化能力，无法适用于真正的 VS 活动。因此，对VS使用这种方法应该相当谨慎。通用 MLSF 可能具有较差的泛化能力，无法适用于真正的 VS 活动。因此，对VS使用这种方法应该相当谨慎。

更新日期：2020-06-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11