当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-01-30 , DOI: 10.1186/s13321-023-00799-5
Jiangxia Wu , Yihao Chen , Jingxing Wu , Duancheng Zhao , Jindi Huang , MuJie Lin , Ling Wang

Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP ( https://kipp.idruglab.cn ) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.

中文翻译:

用于激酶抑制剂分析预测的机器学习方法的大规模比较

传统机器学习(ML)和深度学习(DL)在激酶抑制剂的选择性预测中发挥着关键作用。许多基于可用数据集的模型可用于预测化合物的激酶谱,但关于 ML 和 DL 对于此类任务的优缺点仍存在争议。在这项研究中,我们构建了一个全面的激酶抑制剂基准数据集,涉及 354 种激酶的 141,086 种独特化合物和 216,823 个明确的生物测定数据点。然后,我们系统地比较了 12 种 ML 和 DL 方法在激酶谱预测任务上的性能。大量的实验结果表明,(1)基于描述符的机器学习模型在预测性能方面通常略优于基于指纹的机器学习模型。RF 作为一种集成学习方法显示出整体最佳的预测性能。(2) 基于图的单任务深度学习模型通常不如传统的基于描述符和指纹的机器学习模型,然而,相应的多任务模型通常提高了激酶谱预测的平均准确性。例如,多任务 FP-GNN 模型的平均 AUC 为 0.807,优于传统的基于描述符和指纹的 ML 模型。(3)基于投票和堆叠方法的融合模型可以进一步提高激酶谱预测任务的性能,具体来说,RF::AtomPairs + FP2 + RDKitDes融合模型表现最好,在测试集上平均AUC值最高为0.825。这些发现为指导激酶谱预测任务的 ML 和 DL 方法的选择提供了有用的信息。最后,基于最佳模型开发了一个名为 KIPP ( https://kipp.idruglab.cn ) 的在线平台和 python 软件来支持激酶谱预测,以及各种激酶抑制剂识别任务,包括虚拟筛选、化合物重新定位和目标钓鱼。
更新日期:2024-01-31
down
wechat
bug