当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development and rigorous validation of antimalarial predictive models using machine learning approaches.
SAR and QSAR in Environmental Research ( IF 3 ) Pub Date : 2019-07-22 , DOI: 10.1080/1062936x.2019.1635526
Danishuddin 1 , G Madhukar 1 , M Z Malik 1 , N Subbarao 1
Affiliation  

The large collection of known and experimentally verified compounds from the ChEMBL database was used to build different classification models for predicting the antimalarial activity against Plasmodium falciparum. Four different machine learning methods, namely the support vector machine (SVM), random forest (RF), k-nearest neighbour (kNN) and XGBoost have been used for the development of models using the diverse antimalarial dataset from ChEMBL. A well-established feature selection framework was used to select the best subset from a larger pool of descriptors. Performance of the models was rigorously evaluated by evaluation of the applicability domain, Y-scrambling and AUC-ROC curve. Additionally, the predictive power of the models was also assessed using probability calibration and predictiveness curves. SVM and XGBoost showed the best performances, yielding an accuracy of ~85% on the independent test set. In term of probability prediction, SVM and XGBoost were well calibrated. Total gain (TG) from the predictiveness curve was more related to SVM (TG = 0.67) and XGBoost (TG = 0.75). These models also predict the high-affinity compounds from PubChem antimalarial bioassay (as external validation) with a high probability score. Our findings suggest that the selected models are robust and can be potentially useful for facilitating the discovery of antimalarial agents.



中文翻译:

使用机器学习方法开发和严格验证抗疟预测模型。

从ChEMBL数据库中收集了大量已知和经过实验验证的化合物,用于建立不同的分类模型来预测对恶性疟原虫的抗疟活性。四种不同的机器学习方法,即支持向量机(SVM),随机森林(RF),k最近邻(kNN)和XGBoost已用于使用来自ChEMBL的各种抗疟疾数据集开发模型。完善的特征选择框架用于从较大的描述符池中选择最佳子集。通过评估适用范围,Y加扰和AUC-ROC曲线,严格评估了模型的性能。此外,还使用概率校准和预测性曲线评估了模型的预测性。SVM和XGBoost表现出最佳性能,在独立测试仪上的准确度约为85%。在概率预测方面,对SVM和XGBoost进行了很好的校准。预测曲线的总增益(TG)与SVM(TG = 0.67)和XGBoost(TG = 0.75)更为相关。这些模型还可以通过PubChem抗疟生物测定法(作为外部验证)以高概率分数预测高亲和力化合物。我们的发现表明,所选模型是可靠的,并且对于促进发现抗疟药可能具有潜在的用处。

更新日期:2019-07-22
down
wechat
bug