当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2020-03-05 , DOI: 10.1186/s13321-020-00421-y
Dejun Jiang , Tailong Lei , Zhe Wang , Chao Shen , Dongsheng Cao , Tingjun Hou

Breast cancer resistance protein (BCRP/ABCG2), an ATP-binding cassette (ABC) efflux transporter, plays a critical role in multi-drug resistance (MDR) to anti-cancer drugs and drug–drug interactions. The prediction of BCRP inhibition can facilitate evaluating potential drug resistance and drug–drug interactions in early stage of drug discovery. Here we reported a structurally diverse dataset consisting of 1098 BCRP inhibitors and 1701 non-inhibitors. Analysis of various physicochemical properties illustrates that BCRP inhibitors are more hydrophobic and aromatic than non-inhibitors. We then developed a series of quantitative structure–activity relationship (QSAR) models to discriminate between BCRP inhibitors and non-inhibitors. The optimal feature subset was determined by a wrapper feature selection method named rfSA (simulated annealing algorithm coupled with random forest), and the classification models were established by using seven machine learning approaches based on the optimal feature subset, including a deep learning method, two ensemble learning methods, and four classical machine learning methods. The statistical results demonstrated that three methods, including support vector machine (SVM), deep neural networks (DNN) and extreme gradient boosting (XGBoost), outperformed the others, and the SVM classifier yielded the best predictions (MCC = 0.812 and AUC = 0.958 for the test set). Then, a perturbation-based model-agnostic method was used to interpret our models and analyze the representative features for different models. The application domain analysis demonstrated the prediction reliability of our models. Moreover, the important structural fragments related to BCRP inhibition were identified by the information gain (IG) method along with the frequency analysis. In conclusion, we believe that the classification models developed in this study can be regarded as simple and accurate tools to distinguish BCRP inhibitors from non-inhibitors in drug design and discovery pipelines.

中文翻译:

药物发现中的ADMET评估。20.通过机器学习预测乳腺癌抗性蛋白抑制

乳腺癌抗性蛋白(BCRP / ABCG2)是一种ATP结合盒(ABC)外排转运蛋白,在抗癌药物的多药耐药性(MDR)和药物间相互作用中起着至关重要的作用。BCRP抑制的预测可以促进在药物发现早期评估潜在的耐药性和药物相互作用。在这里,我们报告了一个结构多样的数据集,其中包含1098个BCRP抑制剂和1701个非抑制剂。对各种理化性质的分析表明,BCRP抑制剂比非抑制剂更具疏水性和芳香性。然后,我们开发了一系列定量结构-活性关系(QSAR)模型来区分BCRP抑制剂和非抑制剂。最佳特征子集由名为rfSA的包装特征选择方法(结合随机森林的模拟退火算法)确定,并基于最佳特征子集使用七种机器学习方法建立分类模型,其中包括深度学习方法,两种集成学习方法和四种经典机器学习方法。统计结果表明,包括支持向量机(SVM),深度神经网络(DNN)和极端梯度提升(XGBoost)在内的三种方法均优于其他方法,并且SVM分类器得出的预测最好(MCC = 0.812和AUC = 0.958)测试集)。然后,使用基于扰动的不可知模型方法来解释我们的模型并分析不同模型的代表特征。应用领域分析证明了我们模型的预测可靠性。此外,通过信息增益(IG)方法和频率分析,鉴定了与BCRP抑制有关的重要结构片段。总之,我们认为,在这项研究中开发的分类模型可以视为在药物设计和发现流程中区分BCRP抑制剂与非抑制剂的简单而准确的工具。
更新日期:2020-03-05
down
wechat
bug