当前位置: X-MOL 学术Chem. Res. Toxicol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Saagar–A New, Extensible Set of Molecular Substructures for QSAR/QSPR and Read-Across Predictions
Chemical Research in Toxicology ( IF 4.1 ) Pub Date : 2020-12-25 , DOI: 10.1021/acs.chemrestox.0c00464
Alexander Y Sedykh 1 , Ruchir R Shah 1 , Nicole C Kleinstreuer 2 , Scott S Auerbach 2 , Vijay K Gombar 1
Affiliation  

Molecular structure-based predictive models provide a proven alternative to costly and inefficient animal testing. However, due to a lack of interpretability of predictive models built with abstract molecular descriptors they have earned the notoriety of being black boxes. Interpretable models require interpretable descriptors to provide chemistry-backed predictive reasoning and facilitate intelligent molecular design. We developed a novel set of extensible chemistry-aware substructures, Saagar, to support interpretable predictive models and read-across protocols. Performance of Saagar in chemical characterization and search for structurally similar actives for read-across applications was compared with four publicly available fingerprint sets (MACCS (166), PubChem (881), ECFP4 (1024), ToxPrint (729)) in three benchmark sets (MUV, ULS, and Tox21) spanning ∼145 000 compounds and 78 molecular targets at 1%, 2%, 5%, and 10% false discovery rates. In 18 of the 20 comparisons, interpretable Saagar features performed better than the publicly available, but less interpretable and fixed-bit length, fingerprints. Examples are provided to show the enhanced capability of Saagar in extracting compounds with higher scaffold similarity. Saagar features are interpretable and efficiently characterize diverse chemical collections, thus making them a better choice for building interpretable predictive in silico models and read-across protocols.

中文翻译:

Saagar——用于 QSAR/QSPR 和交叉预测的一组新的、可扩展的分子子结构

基于分子结构的预测模型为昂贵且低效的动物试验提供了一种行之有效的替代方案。然而,由于缺乏用抽象分子描述符构建的预测模型的可解释性,它们已经赢得了黑匣子的恶名。可解释模型需要可解释的描述符来提供化学支持的预测推理并促进智能分子设计。我们开发了一组新颖的可扩展化学感知子结构Saagar,以支持可解释的预测模型和跨读协议。性能Saagar在化学表征和寻找结构相似的活性物质以用于交叉读取应用中,在三个基准集(MUV、 ULS 和 Tox21) 以 1%、2%、5% 和 10% 的错误发现率跨越 145000 种化合物和 78 个分子靶标。在 20 次比较中的 18 次中,可解释的Saagar特征比公开可用的指纹表现更好,但可解释性和固定位长指纹较差。提供的示例显示了Saagar在提取具有更高支架相似性的化合物方面的增强能力。萨加尔特征是可解释的并且有效地表征了不同的化学集合,从而使它们成为构建可解释的计算机模型预测和跨读协议的更好选择。
更新日期:2021-02-15
down
wechat
bug