当前位置: X-MOL 学术ChemMedChem › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mapping of the Available Chemical Space versus the Chemical Universe of Lead‐Like Compounds
ChemMedChem ( IF 3.6 ) Pub Date : 2018-01-29 , DOI: 10.1002/cmdc.201700561
Arkadii Lin 1 , Dragos Horvath 1 , Valentina Afonina 1, 2 , Gilles Marcou 1 , Jean-Louis Reymond 3 , Alexandre Varnek 1
Affiliation  

This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment‐like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB‐17 and real‐world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment‐like chemical space can actually be built, in spite of a limited (≪105) maximal number of compounds (“frame set”) usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a “coverage check” step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure–activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real‐world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB‐17, a fragment‐like subset of GDB‐17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted.

中文翻译:

可用化学空间与铅类化合物的化学宇宙之间的映射

据我们所知,这是迄今为止基于碎片状化学空间(4000万个分子的不超过17个重原子的分子,来自理论上列举的GDB-17和实数的化学空间)的生成拓扑图(GTM)迄今为止最全面的分析世界PubChem / ChEMBL数据库)。面临的挑战是证明尽管存在有限的空间(≪ 10 5)可用于安装GTM歧管的最大化合物数量(“框架组”)。进化图的构建策略已通过“覆盖率检查”步骤进行了更新,该步骤将丢弃无法在框架集之外容纳化合物的歧管。演化图具有将20多个外部结构-活动集的活动对象与非活动对象分离的良好倾向。事实证明,它可以适当容纳40 m化合物的整个集合 。接下来,它用作库比较工具,以突出现实分子(PubChem和ChEMBL)与以FDB-17(GDB-17的片段状子集,包含1000万个分子)代表的所有可能物种的宇宙之间的偏差。强调了适合某些图书馆而又不存在的特定模式(多样性漏洞)。
更新日期:2018-01-29
down
wechat
bug