当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-10-16 , DOI: 10.1186/s13321-021-00560-w
Chao Shen 1, 2 , Xueping Hu 1 , Junbo Gao 1 , Xujun Zhang 1 , Haiyang Zhong 1 , Zhe Wang 1 , Lei Xu 3 , Yu Kang 1 , Dongsheng Cao 4 , Tingjun Hou 1, 2
Affiliation  

Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein–ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein–ligand binding poses.

中文翻译:

交叉对接姿势对蛋白质配体结合姿势预测机器学习分类器性能的影响

基于结构的药物设计取决于对蛋白质-配体结合复合物的三维 (3D) 结构的详细了解,但由于缺乏评分函数 (SFs),准确预测配体结合姿势仍然是分子对接的主要挑战) 和对配体结合时蛋白质灵活性的无知。在这项研究中,基于从 PDBbind 数据库专门构建的交叉对接数据集,我们开发了几个 XGBoost 训练的分类器来从诱饵中区分接近原生的绑定姿势,并系统地评估它们在有/没有交叉参与的情况下的表现。训练/测试集中的停靠姿势。计算结果表明,使用扩展连接交互特征(ECIF),根据通过随机分裂或细核分裂的验证以及重新对接或交叉对接姿势的测试,Vina能量项和对接姿势排名为特征可以达到最佳性能。此外,还发现,尽管三重聚类交叉验证的性能显着下降,但包含 Vina 能量项可以有效地保证模型性能的下限,从而提高其泛化能力。此外,我们的计算结果还强调了将交叉对接姿势纳入具有广泛应用领域和高鲁棒性结合姿势预测的 SF 训练的重要性。源代码和新开发的交叉对接数据集可以在 https://github 上免费获得。com/sc8668/ml_pose_prediction 和 https://zenodo.org/record/5525936 分别在开源许可下。我们相信我们的研究可能为开发和评估新的基于机器学习的 SFs (MLSFs) 以预测蛋白质-配体结合姿势提供有价值的指导。
更新日期:2021-10-17
down
wechat
bug