当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics
Genome Biology ( IF 10.1 ) Pub Date : 2022-06-20 , DOI: 10.1186/s13059-022-02701-2
Laura Fancello 1 , Thomas Burger 1
Affiliation  

Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach. We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible. In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications.

中文翻译:


蛋白质基因组学分析以及转录组信息减少蛋白质数据库如何以及何时可以增强真核蛋白质组学



蛋白质基因组学旨在通过搜索转录组或基因组衍生的定制蛋白质数据库来识别自下而上蛋白质组学中的变异或未知蛋白质。然而,经验观察表明,这些大型蛋白质组数据库产生的肽鉴定灵敏度较低。已经提出了各种策略来避免这种情况,包括生成简化的转录组信息蛋白质数据库,该数据库仅包含其转录本在样本匹配的转录组中检测到的蛋白质。发现这些可以提高肽鉴定的灵敏度。在这里,我们对这种方法进行了详细的评估。我们发现,肽鉴定中灵敏度的提高实际上是一种统计假象,直接归因于在使用过小的数据库时,目标诱饵竞争准确模拟不正确的目标匹配的能力有限。由于反保守的错误发现率 (FDR) 可能会妨碍所得生物学结论的稳健性,因此我们主张采用对数据库大小不太敏感的替代 FDR 控制方法。然而,减少转录组信息的数据库是有用的,因为它们减少了蛋白质鉴定的模糊性,产生更少的共享肽。此外,搜索参考数据库并随后过滤转录本未表达的蛋白质可以在类似程度上减少蛋白质鉴定的模糊性,但更加透明和可重复。总之,使用转录组信息是一种有趣的策略,但尚未因正确的原因而得到推广。 虽然通过搜索减少的转录组信息数据库来增加肽鉴定是由于使用不适合过小的数据库的 FDR 控制方法而造成的,但转录组信息可以减少蛋白质鉴定的模糊性。
更新日期:2022-06-20
down
wechat
bug