当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-10-29 , DOI: 10.1186/s13321-021-00558-4
Florian Huber 1 , Sven van der Burg 1 , Justin J J van der Hooft 2 , Lars Ridder 1
Affiliation  

Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.

中文翻译:


MS2DeepScore:一种新颖的深度学习相似性测量方法,用于比较串联质谱



质谱数据是医学和生命科学领域许多工作流程中的关键信息来源之一。质量碎片谱通常被认为是其来源的化合物的特征特征,但化学结构本身通常不能轻易地从谱中推断出来。通常,光谱相似性度量被用作结构相似性的代理,但这种方法受到两个指标之间普遍较差的相关性的强烈限制。在这里,我们提出了 MS2DeepScore:一种新颖的连体神经网络,仅根据 MS/MS 碎片谱来预测两种化学结构之间的结构相似性。使用约 15,000 种独特已知化合物的 > 100,000 个质谱的清理数据集,我们训练 MS2DeepScore 以高精度预测谱图对的结构相似性得分。此外,通过Monte-Carlo Dropout对不同模型品种进行采样,以进一步改进预测并评估模型的预测不确定性。在 500 种未见化合物的 3600 个光谱上,MS2DeepScore 能够识别高度可靠的结构匹配,并根据分子对的片段光谱预测 Tanimoto 分数,均方根误差约为 0.15。此外,预测不确定性估计可用于选择均方根误差约为 0.1 的预测子集。此外,我们证明 MS2DeepScore 在从大型质谱数据集中检索化学相关化合物对方面优于经典的光谱相似性测量,从而说明了其光谱库匹配的潜力。 最后,MS2DeepScore 还可用于创建具有化学意义的质谱嵌入,可用于对大量光谱进行聚类。除了最近推出的无监督 Spec2Vec 指标之外,我们相信机器学习支持的质谱相似性测量对于一系列代谢组学数据处理流程具有巨大潜力。
更新日期:2021-10-29
down
wechat
bug