当前位置: X-MOL 学术Metabolomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC–MS metabolomics data
Metabolomics ( IF 3.5 ) Pub Date : 2020-10-21 , DOI: 10.1007/s11306-020-01738-3
Kelsey Chetnik 1 , Lauren Petrick 2, 3 , Gaurav Pandey 1, 3
Affiliation  

Introduction

Despite the availability of several pre-processing software, poor peak integration remains a prevalent problem in untargeted metabolomics data generated using liquid chromatography high–resolution mass spectrometry (LC–MS). As a result, the output of these pre-processing software may retain incorrectly calculated metabolite abundances that can perpetuate in downstream analyses.

Objectives

To address this problem, we propose a computational methodology that combines machine learning and peak quality metrics to filter out low quality peaks.

Methods

Specifically, we comprehensively and systematically compared the performance of 24 different classifiers generated by combining eight classification algorithms and three sets of peak quality metrics on the task of distinguishing reliably integrated peaks from poorly integrated ones. These classifiers were compared to using a residual standard deviation (RSD) cut-off in pooled quality-control (QC) samples, which aims to remove peaks with analytical error.

Results

The best performing classifier was found to be a combination of the AdaBoost algorithm and a set of 11 peak quality metrics previously explored in untargeted metabolomics and proteomics studies. As a complementary approach, applying our framework to peaks retained after filtering by 30% RSD across pooled QC samples was able to further distinguish poorly integrated peaks that were not removed from filtering alone. An R implementation of these classifiers and the overall computational approach is available as the MetaClean package at https://CRAN.R-project.org/package=MetaClean.

Conclusion

Our work represents an important step forward in developing an automated tool for filtering out unreliable peak integrations in untargeted LC–MS metabolomics data.



中文翻译:

MetaClean:基于机器学习的分类器,可减少针对性的LC-MS代谢组学数据中的假阳性峰检测

介绍

尽管可以使用多种预处理软件,但在液相色谱高分辨率质谱法(LC-MS)生成的非目标代谢组学数据中,峰积分差仍然是一个普遍存在的问题。结果,这些预处理软件的输出可能会保留错误计算的代谢物丰度,这些代谢物丰度可能会在下游分析中持续存在。

目标

为了解决这个问题,我们提出了一种计算方法,该方法结合了机器学习和峰值质量指标来滤除低质量峰值。

方法

具体来说,我们将8种分类算法和3组峰质量指标结合起来,对24种不同分类器的性能进行了系统地比较,以区分可靠积分的峰和差积分的峰。将这些分类器与在合并质量控制(QC)样品中使用残留标准偏差(RSD)临界值进行比较,该标准旨在去除具有分析误差的峰。

结果

发现性能最好的分类器是AdaBoost算法和先前在非目标代谢组学和蛋白质组学研究中探索的11个峰质量指标的组合。作为一种补充方法,将我们的框架应用于在合并的QC样品中经过30%RSD过滤后保留的峰,可以进一步区分积分不良的峰,这些峰不能单独从过滤中去除。这些分类器的R实现和整体计算方法可通过以下网址的MetaClean软件包获得:https://CRAN.R-project.org/package=MetaClean。

结论

我们的工作代表了开发自动化工具的重要一步,该工具可以过滤掉非目标LC-MS代谢组学数据中不可靠的峰积分。

更新日期:2020-10-30
down
wechat
bug