当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of Natural Product Classes Using Machine Learning and 13C NMR Spectroscopic Data.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-06-15 , DOI: 10.1021/acs.jcim.0c00293
Saúl H Martínez-Treviño 1 , Víctor Uc-Cetina 2 , María A Fernández-Herrera 1 , Gabriel Merino 1
Affiliation  

Structure elucidation of chemical compounds is a complex and challenging activity that requires expertise and well-suited tools. To assign the molecular structure of a given compound, 13C NMR is one of the most widely used techniques because of its broad range of structural information. Taking into account that molecules found in nature can be grouped into natural product (NP) classes because of structural similarities, we explore the possibility of NP class prediction via 13C NMR data. Employing freely available 13C NMR data of NPs, we trained four classifiers for the prediction of eight common NP classes. The best performance was obtained with the XGBoost classifier reaching f1-scores of above 0.82. We also performed experiments with different percentages of positive samples, including the glycoside presence. Furthermore, we tested cases outside the data set, yielding performances above 80% for most classes. For the chromans case, we restricted the test examples to the coumarin subclass, and the prediction accuracy increased to 100%.

中文翻译:

使用机器学习和13C NMR光谱数据预测天然产物类别。

化学化合物的结构阐明是一项复杂而具有挑战性的活动,需要专业知识和合适的工具。为了指定给定化合物的分子结构,13 C NMR由于其广泛的结构信息而成为最广泛使用的技术之一。考虑到由于结构上的相似性,自然界中发现的分子可以分为天然产物(NP)类,因此,我们通过13 C NMR数据探索了NP类预测的可能性。免费就业13NP的13 C NMR数据,我们训练了四个分类器来预测八个常见的NP类。XGBoost分类器的f1-分数达到0.82以上,可获得最佳性能。我们还用不同百分比的阳性样品(包括糖苷的存在)进行了实验。此外,我们在数据集之外测试了案例,大多数类别的性能都超过80%。对于色度情况,我们将测试示例限制在香豆素子类中,并且预测精度提高到100%。
更新日期:2020-07-27
down
wechat
bug