当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-09-08 , DOI: 10.1093/bib/bbab377
Tzu-Hui Yu, Bo-Han Su, Leo Chander Battalora, Sin Liu, Yufeng Jane Tseng

The trade-off between a machine learning (ML) and deep learning (DL) model’s predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure–activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood–brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.

中文翻译:


使用机器学习和深度学习进行集成建模,为具有高预测能力的中枢神经系统药物分类提供可解释的通用规则



机器学习(ML)和深度学习(DL)模型的可预测性和可解释性之间的权衡已成为中枢神经系统相关定量结构-活动关系(CNS-QSAR)分析中日益关注的问题。许多最先进的预测模型由于其类似黑匣子的性质而无法提供结构见解。缺乏可解释性以及进一步提供简单的规则对于 CNS-QSAR 模型来说将是一个挑战。为了解决这些问题,我们开发了一种协议,将 ML 和 DL 的力量结合起来,生成一组易于解释且具有高预测能力的简单规则。使用包含 940 种市场药物(315 种 CNS 活性药物、625 种 CNS 非活性药物)以及支持向量机和图卷积网络算法的数据集。还构建了单独的 ML/DL 建模方法以进行比较。使用 117 种市场药物(42 种 CNS 活性药物,75 种 CNS 非活性药物)的额外外部数据集评估这些模型的性能。采用指纹分割验证来确保模型的严格性和通用性。由此产生的新型混合集成模型优于其他组成的传统 QSAR 模型,准确度为 0.96,F1 分数为 0.95。凭借该协议提供的可解释性的力量,我们的模型制定了一组简单的物理化学规则,以使用六个子结构特征来确定化合物是否可以是 CNS 药物。这些规则比经典指南表现出更高的分类能力,比仅针对血脑屏障渗透性具有更高的特异性和更多的机制见解。这种混合协议有可能用于其他药物特性预测。
更新日期:2021-09-08
down
wechat
bug