Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm.,Proteome Science

当前位置： X-MOL 学术 › Proteome Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm.
Proteome Science ( IF 2.1 ) Pub Date : 2016-12-17 , DOI: 10.1186/s12953-016-0107-8
Tyman E Stanford ₁ , Christopher J Bagley ₁ , Patty J Solomon ₁

Affiliation

BACKGROUND Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel 'continuous' line segment algorithm that efficiently operates over data with a transformed m/z-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed m/z scale. RESULTS The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel 'continuous' line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. CONCLUSIONS The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines.

中文翻译：

新型滑动窗口算法辅助蛋白质组质谱数据的基线基线减法。

背景技术蛋白质组基质辅助激光解吸/电离（MALDI）线性飞行时间（TOF）质谱（MS）可用于从生物学样品中产生蛋白质谱，目的是发现疾病的生物标记。但是，原始蛋白质谱存在多种偏见或系统性变化，需要先进行预处理，然后才能进行有意义的下游分析。基线减法是从光谱中去除非肽信号的早期预处理步骤，其复杂性如下：（i）对于质荷比较高的肽，每个光谱平均具有较宽的峰（m / z），以及（ii）耗时且容易出错的反复试验过程，用于优化基线减法输入参数。鉴于上述复杂性，我们提出了一种自动化管道，该管道包括（i）一种新颖的“连续”线段算法，该算法可有效处理具有转换后的m / z轴的数据，以消除肽质量与峰宽之间的关系，以及（ii）一种无需输入的算法，即可估算出转换后的m / z标度上的峰宽。结果使用六个不同的m / z轴转换将自动基线减法方法部署在六个可公开获得的蛋白质组学MS数据集上。当与黄金标准基线减去信号进行比较时，使用平均绝对标定误差（MASE）定量评估自动基线减去管线的最佳性。研究的几种转换能够减少（甚至不能完全消除），峰宽和峰位置之间的关系导致使用自动管线进行接近最佳基线的扣除。就所需的计算时间而言，所提出的新颖的“连续”线段算法显示出远远优于幼稚的滑动窗口算法。在实际的MALDI TOF-MS数据上，计算时间的改进至少翻了四倍，在许多模拟数据集上，计算时间的改进至少翻了一个数量级。结论拟议中的管道的优势包括针对基线减法的知情的和特定于数据的输入参数，避免了时间密集和主观的分段基线减法以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。就所需的计算时间而言，所提出的新颖的“连续”线段算法显示出远远优于幼稚的滑动窗口算法。在实际的MALDI TOF-MS数据上，计算时间的改进至少翻了四倍，在许多模拟数据集上，计算时间的改进至少翻了一个数量级。结论拟议中的管道的优势包括针对基线减法的知情的和特定于数据的输入参数，避免了时间密集和主观的分段基线减法以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。就所需的计算时间而言，所提出的新颖的“连续”线段算法显示出远远优于幼稚的滑动窗口算法。在实际的MALDI TOF-MS数据上，计算时间的改进至少翻了四倍，在许多模拟数据集上，计算时间的改进至少翻了一个数量级。结论拟议中的管道的优势包括针对基线减法的知情的和特定于数据的输入参数，避免了时间密集和主观的分段基线减法以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。在实际的MALDI TOF-MS数据上，计算时间的改进至少翻了四倍，在许多模拟数据集上，计算时间的改进至少翻了一个数量级。结论拟议中的管道的优势包括针对基线减法的知情的和特定于数据的输入参数，避免了时间密集和主观的分段基线减法以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。在实际的MALDI TOF-MS数据上，计算时间的改进至少翻了四倍，在许多模拟数据集上，计算时间的改进至少翻了一个数量级。结论拟议中的管道的优势包括针对基线减法的知情的和特定于数据的输入参数，避免了时间密集和主观的分段基线减法以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。以及完全自动化基线减法的能力。此外，可以将各个步骤用作独立的例程。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11