当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-10-18 , DOI: 10.1186/s13321-021-00559-3
Alice Capecchi 1 , Jean-Louis Reymond 1
Affiliation  

Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.

中文翻译:

使用 COCONUT 数据库和机器学习对来自植物、真菌或细菌的天然产物进行分类

天然产物 (NPs) 是发现新药的最重要资源之一。在这里,我们询问是否可以从最近报告的分配给植物、真菌或细菌的开放天然产品 (COCONUT) 数据库中的 60,171 个 NP 子集中的分子结构中指定 NP 来源。在交互式树图 (TMAP) 中可视化该子集,使用 MAP4(MinHashed 原子对指纹)根据分配的来源(https://tm.gdb.tools/map4/coconut_tmap/)和支持向量机聚类 NP (SVM) 用 MAP4 训练正确分配了该子集中 94% 的植物、89% 的真菌和 89% 的细菌 NP 的来源。一个基于 SVM 的在线工具,使用整个子集进行训练,正确分配了具有相似性能的其他 NP 的来源 (https://np-svm-map4.gdb.tools/)。
更新日期:2021-10-19
down
wechat
bug