TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions,Journal of Medicinal Chemistry

当前位置： X-MOL 学术 › J. Med. Chem. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions
Journal of Medicinal Chemistry ( IF 7.3 ) Pub Date : 2022-06-01 , DOI: 10.1021/acs.jmedchem.2c00460
Xujun Zhang _{1,

2,

3} , Chao Shen ₁ , Ben Liao ₃ , Dejun Jiang ₁ , Jike Wang _{1,

4} , Zhenxing Wu ₁ , Hongyan Du ₁ , Tianyue Wang ₁ , Wenbo Huo ₅ , Lei Xu ₆ , Dongsheng Cao ₇ , Chang-Yu Hsieh ₃ , Tingjun Hou _{1,

2}

Affiliation

Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named Topology-based and Conformation-based decoys generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUD-E dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.

中文翻译：

TocoDecoy：一种设计用于训练和基准测试机器学习评分函数的无偏数据集的新方法

为针对给定目标的基于结构的虚拟筛选开发准确的基于机器学习的评分函数 (MLSF) 需要具有结构多样的活性物质和诱饵的大型无偏数据集。然而，大多数用于开发 MLSF 的数据集都是为传统的 SF 设计的，并且可能存在隐藏的偏差和数据不足。因此，我们开发了一种名为To pology-based and Conformation -based decoy的新方法s generation (TocoDecoy)，它集成了两种策略，通过调整特定目标的活动来生成诱饵，以生成用于训练和基准测试 MLSF 的无偏且可扩展的数据集。对于隐藏偏差评估，评估了在 TocoDecoy、LIT-PCBA 和 DUD-E 类数据集上训练的 InteractionGraphNet (IGN) 的性能。结果表明，在 TocoDecoy 数据集上训练的 IGN 模型与在 LIT-PCBA 数据集上训练的模型具有竞争力，但明显优于在 DUD-E 数据集上训练的模型，这表明 TocoDecoy 中的诱饵对于训练和基准测试 MLSF 没有偏见。

更新日期：2022-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>