当前位置: X-MOL 学术Proteomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing.
Proteomics ( IF 3.4 ) Pub Date : 2020-05-16 , DOI: 10.1002/pmic.202000002
Lei Wang 1 , Kaiyuan Liu 1 , Sujun Li 1 , Haixu Tang 1
Affiliation  

With the accumulation of MS/MS spectra collected in spectral libraries, the spectral library searching approach emerges as an important approach for peptide identification in proteomics, complementary to the commonly used protein database searching approach, in particular for the proteomic analyses of well‐studied model organisms, such as human. Existing spectral library searching algorithms compare a query MS/MS spectrum with each spectrum in the library with matched precursor mass and charge state, which may become computationally intensive with the rapidly growing library size. Here, the software msSLASH, which implements a fast spectral library searching algorithm based on the Locality‐Sensitive Hashing (LSH) technique, is presented. The algorithm first converts the library and query spectra into bit‐strings using LSH functions, and then computes the similarity between the spectra with highly similar bit‐string. Using the spectral library searching of large real‐world MS/MS spectra datasets, it is demonstrated that the algorithm significantly reduced the number of spectral comparisons, and as a result, achieved 2–9X speedup in comparison with existing spectral library searching algorithm SpectraST. The spectral searching algorithm is implemented in C/C++, and is ready to be used in proteomic data analyses.

中文翻译:

使用局部敏感哈希的快速且内存高效的谱库搜索算法。

随着谱库中收集的 MS/MS 谱图的积累,谱库搜索方法成为蛋白质组学中肽鉴定的重要方法,与常用的蛋白质数据库搜索方法互补,特别是对于经过充分研究的模型的蛋白质组分析有机体,例如人类。现有的谱库搜索算法将查询 MS/MS 谱图与库中具有匹配前体质量和电荷状态的每个谱图进行比较,随着谱库大小的快速增长,这可能会变得计算密集型。这里提出了软件 msSLASH,它实现了基于局部敏感哈希(LSH)技术的快速谱库搜索算法。该算法首先使用LSH函数将库谱和查询谱转换为位串,然后计算具有高度相似位串的谱之间的相似度。通过对大型真实 MS/MS 谱数据集的谱库搜索,结果表明该算法显着减少了谱图比较的次数,与现有谱库搜索算法 SpectraST 相比,实现了 2-9 倍的加速。谱搜索算法用 C/C++ 实现,可用于蛋白质组数据分析。
更新日期:2020-05-16
down
wechat
bug