当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2021-09-08 , DOI: 10.1093/bib/bbab377
Tzu-Hui Yu, Bo-Han Su, Leo Chander Battalora, Sin Liu, Yufeng Jane Tseng

The trade-off between a machine learning (ML) and deep learning (DL) model’s predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure–activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood–brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.

中文翻译:

使用机器学习和深度学习进行集成建模,为具有高预测能力的 CNS 药物分类提供可解释的通用规则

机器学习 (ML) 和深度学习 (DL) 模型的可预测性及其可解释性之间的权衡一直是中枢神经系统相关定量结构 - 活动关系 (CNS-QSAR) 分析中日益关注的问题。由于其类似黑盒的性质,许多最先进的预测模型未能提供结构性见解。对于 CNS-QSAR 模型而言,缺乏可解释性并进一步提供简单的规则将是一个挑战。为了解决这些问题,我们开发了一种协议,将 ML 和 DL 的力量结合起来,生成一组简单的规则,这些规则易于解释,具有很高的预测能力。使用了具有支持向量机和图卷积网络算法的 940 种市场药物(315 种 CNS 活跃,625 种 CNS 非活跃)的数据集。还构建了单独的 ML/DL 建模方法以进行比较。使用 117 种市场药物(42 种 CNS 活性,75 种 CNS 非活性)的附加外部数据集评估这些模型的性能。采用指纹分割验证来确保模型的严格性和普遍性。由此产生的新型混合集成模型以 0.96 的准确度和 0.95 的 F1 得分优于其他组成的传统 QSAR 模型。凭借该协议提供的可解释性的力量,我们的模型制定了一套简单的物理化学规则,以确定化合物是否可以成为使用六个子结构特征的中枢神经系统药物。这些规则显示出比经典指南更高的分类能力,具有更高的特异性和更多的机制见解,而不仅仅是血脑屏障通透性。这种混合协议可以潜在地用于其他药物特性预测。
更新日期:2021-09-08
down
wechat
bug