当前位置: X-MOL 学术J. Proteomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of different variant sequence types coupled with decoy generation methods used in concatenated target-decoy database searches for proteogenomic research
Journal of Proteomics ( IF 2.8 ) Pub Date : 2020-10-24 , DOI: 10.1016/j.jprot.2020.104021
Wai-Kok Choong , Ting-Yi Sung

Concatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants. However, the performance of applying various decoy generation methods on the peptide-based variant sequence database is still unclear, compared to the protein-based database. In this paper, we conduct a thorough comparison on target-decoy databases constructed by the above two types of databases coupled with various decoy generation methods for proteogenomic analyses. The results show that for the protein-based variant sequence database, using the reverse or the pseudo reverse method achieves similar performance for variant peptide identification. Furthermore, for the peptide-based database, the pseudo reverse method is more suitable than the widely used reverse method, as shown by identifying 6% more variant PSMs in a HEK293 cell line data set.

Significance

In our survey of publications on proteogenomic studies, 57% of the studies adopt the peptide-based variant sequence database coupled with the reverse method for decoy generation to construct a target-decoy database for searches. However, our results show that when using the peptide-based variant sequence database, it is better to adopt the pseudo reverse method for generating decoy sequences, to avoid leading to fewer variant peptides being identified.



中文翻译:

串联靶诱饵数据库搜索中用于蛋白质组学研究的不同变异序列类型与诱饵生成方法的比较

蛋白质靶基因研究中通常使用串联的靶诱饵数据库搜索来鉴定变体。当前,基于蛋白质和基于肽的序列数据库被用于存储变体序列以用于数据库搜索。基于蛋白质的数据库记录了全长的野生型蛋白质序列,但使用给定的变体事件替换了原始氨基酸,而基于肽的数据库仅保留了包含变体的计算机消化的肽。但是,与基于蛋白质的数据库相比,在基于肽的变异序列数据库上应用各种诱饵生成方法的性能仍不清楚。在本文中,我们对由以上两种类型的数据库以及各种诱饵生成方法进行蛋白质组学分析的目标诱饵数据库进行了全面的比较。结果表明,对于基于蛋白质的变异序列数据库,使用反向或伪反向方法可实现相似的肽变异识别性能。此外,对于基于肽的数据库,伪反向方法比广泛使用的反向方法更适合,如在HEK293细胞系数据集中识别出6%的变体PSM所示。

意义

在我们对蛋白质组学研究出版物的调查中,有57%的研究采用了基于肽的变异序列数据库以及反向方法进行诱饵生成,从而构建了目标诱饵数据库进行搜索。但是,我们的结果表明,在使用基于肽的变体序列数据库时,最好采用伪反向方法生成诱饵序列,以避免导致鉴定出较少的变体肽。

更新日期:2020-12-01
down
wechat
bug