当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
“DompeKeys”: a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-02-23 , DOI: 10.1186/s13321-024-00813-4
Candida Manelfi , Valerio Tazzari , Filippo Lunghini , Carmen Cerchia , Anna Fava , Alessandro Pedretti , Pieter F. W. Stouten , Giulio Vistoli , Andrea Rosario Beccari

The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed—as integral part of EXSCALATE, Dompé’s end-to-end drug discovery platform—the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds’ activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.

中文翻译:

“DompeKeys”:一组新颖的基于子结构的描述符,用于高效的化学空间映射、机器学习模型的开发和结构解释以及大型数据库的索引

将化学结构转换为计算机可读的描述符,能够捕获关键的结构方面,在化学信息学和计算机辅助药物设计领域至关重要。分子指纹代表了一类广泛使用的描述符;然而,对于大型数据库来说,它们的生成过程非常耗时,并且容易受到偏差的影响。因此,非常需要能够准确检测预定义结构片段并且无需冗长生成过程的描述符。为了满足额外的需求,此类描述符还应该能够被药物化学家解释,并且适合为包含数万亿种化合物的数据库建立索引。为此,我们开发了 DompeKeys (DK),作为 Dompé 端到端药物发现平台 EXSCALATE 的组成部分,这是一种新的基于子结构的描述符集,它对表征药物感兴趣的化合物的化学特征进行编码。DK 代表精心策划的 SMARTS 字符串的详尽集合,定义了不同复杂程度的化学特征,从特定的官能团和结构模式到更简单的药效点,对应于分层互连子结构的网络。由于其扩展和分层结构,DK 可以在不同类型的应用程序中使用,并具有良好的性能。特别是,我们展示了它们如何非常适合化学空间的有效映射以及子结构搜索和虚拟筛选。值得注意的是,DK 的结合产生了高性能的机器学习模型,用于预测化合物的活性和代谢反应的发生。生成 DK 的协议可在 https://dompekeys.exscalate.eu 免费获取,并与分子解剖学协议完全集成,用于生成和分析分层互连的分子支架和框架,从而为药物提供全面且灵活的工具设计应用程序。
更新日期:2024-02-24
down
wechat
bug