当前位置: X-MOL 学术ChemMedChem › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
In Silico Prediction of Blood–Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods
ChemMedChem ( IF 3.4 ) Pub Date : 2018-09-21 , DOI: 10.1002/cmdc.201800533
Zhuang Wang 1 , Hongbin Yang 1 , Zengrui Wu 1 , Tianduanyi Wang 1 , Weihua Li 1 , Yun Tang 1 , Guixia Liu 1
Affiliation  

The blood–brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false‐positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2 D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB‐positive compounds and of 0.899 (specificity) to predict BBB‐negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB‐positive as well as BBB‐negative compounds, which would be quite helpful in early drug discovery.

中文翻译:

机器学习和重采样方法的计算机模拟化合物的血脑屏障通透性

血脑屏障(BBB)作为吸收的一部分,通过将大脑组织与血液分开来保护中枢神经系统。近年来,BBB渗透性已成为化学ADMET预测中的关键问题,但是几乎所有模型都是使用不平衡数据集构建的,这导致了较高的假阳性率。因此,我们试图解决数据集偏倚的问题,并建立了包含2358种化合物的可靠分类模型。同时使用机器学习和重采样方法来完善具有2D分子描述符和分子指纹以表示化学物质的模型。通过一系列评估,我们认识到,诸如合成少数族裔过采样技术(SMOTE)和SMOTE +编辑的最近邻居之类的重采样方法可以有效解决数据集不平衡的问题,并且MACCS指纹与支持向量机相结合的效果最佳。在最终构建了共识模型之后,最终外部数据集的总体准确率提高到0.966。此外,测试集模型的准确率是0.919,预测BBB阳性化合物的灵敏度达到0.925(敏感性),预测BBB阴性化合物的特异性达到0.899(特异性)。与其他BBB分类模型相比,我们的模型减少了误报率,并且在预测BBB阳性和BBB阴性化合物方面更可靠,这对于早期药物发现非常有帮助。
更新日期:2018-09-21
down
wechat
bug