当前位置: X-MOL 学术J. Proteome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics.
Journal of Proteome Research ( IF 3.8 ) Pub Date : 2020-03-16 , DOI: 10.1021/acs.jproteome.9b00736
Pavel Sulimov 1 , Attila Kertész-Farkas 1
Affiliation  

Peptide-spectrum-match (PSM) scores used in database searching are calibrated to spectrum- or spectrum-peptide-specific null distributions. Some calibration methods rely on specific assumptions and use analytical models (e.g., binomial distributions), whereas other methods utilize exact empirical null distributions. The former may be inaccurate because of unjustified assumptions, while the latter are accurate, albeit computationally exhaustive. Here, we introduce a novel, nonparametric, heuristic PSM score calibration method, called Tailor, which calibrates PSM scores by dividing them with the top 100-quantile of the empirical, spectrum-specific null distributions (i.e., the score with an associated p-value of 0.01 at the tail, hence the name) observed during database searching. Tailor does not require any optimization steps or long calculations; it does not rely on any assumptions on the form of the score distribution (i.e., if it is, e.g., binomial); however, it relies on our empirical observation that the mean and the variance of the null distributions are correlated. In our benchmark, we re-calibrated the match scores of XCorr from Crux, HyperScore scores from X!Tandem, and the p-values from OMSSA with the Tailor method and obtained more spectrum annotations than with raw scores at any false discovery rate level. Moreover, Tailor provided slightly more annotations than E-values of X!Tandem and OMSSA and approached the performance of the computationally exhaustive exact p-value method for XCorr on spectrum data sets containing low-resolution fragmentation information (MS2) around 20–150 times faster. On high-resolution MS2 data sets, the Tailor method with XCorr achieved state-of-the-art performance and produced more annotations than the well-calibrated residue-evidence (Res-ev) score around 50–80 times faster.

中文翻译:

裁缝:Shot弹枪蛋白质组学中基于数据库搜索的肽段鉴定的非参数快速得分校准方法。

将数据库搜索中使用的肽谱匹配(PSM)分数校准为特定于光谱或特定于光谱肽的零分布。一些校准方法依赖于特定的假设并使用分析模型(例如,二项式分布),而其他方法则使用精确的经验零分布。前者可能由于不合理的假设而不准确,而后者则是准确的,尽管在计算上是详尽的。在这里,我们介绍了一种新颖的,非参数的,启发式的PSM评分校准方法,称为Tailor,它通过将它们除以经验的,特定于频谱的空分布的前100个位数(即与p相关的评分)来校准PSM评分在数据库搜索过程中观察到-值的末尾为0.01,因此为名称)。裁缝不需要任何优化步骤或冗长的计算;它不依赖于分数分布形式的任何假设(即,例如,是否为二项式);然而,它依赖于我们的经验观察,即零分布的均值和方差是相关的。在我们的基准测试中,我们使用Tailor方法重新校准了来自Crux的XCorr匹配得分,来自X!Tandem的HyperScore得分以及来自OMSSA的p值,并且在任何错误发现率水平下,与原始得分相比,它获得了更多的频谱注释。此外,Tailor提供的注释比X!Tandem和OMSSA的E值略多,并且达到了计算详尽的精确度的性能。包含低分辨率碎片信息(MS2)的光谱数据集上XCorr的p值方法快20-150倍。在高分辨率的MS2数据集上,采用XCorr的Tailor方法可实现最先进的性能,并且比经过良好校准的残留证据(Res-ev)得分快约50-80倍,从而产生更多注释。
更新日期:2020-04-24
down
wechat
bug