当前位置: X-MOL 学术Proteomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proteogenomics-Guided Evaluation of RNA-Seq Assembly and Protein Database Construction for Emergent Model Organisms.
Proteomics ( IF 3.4 ) Pub Date : 2020-04-06 , DOI: 10.1002/pmic.201900261
Yannick Cogne 1 , Duarte Gouveia 1 , Arnaud Chaumot 2 , Davide Degli-Esposti 2 , Olivier Geffard 2 , Olivier Pible 1 , Christine Almunia 1 , Jean Armengaud 1
Affiliation  

Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.

中文翻译:

蛋白质组学指导的新兴模型生物RNA-Seq组装和蛋白质数据库构建的评估。

蛋白质组学发展迅速,因为如今,基因组学,转录组学和蛋白质组学可以在任何新物种上轻松进行。这种方法可以在比较条件时确定分子途径的关键改变。对于动植物,RNA序列信息蛋白质组学是解释串联质谱图谱的最流行手段,该质谱图是针对尚未对其基因组进行测序的物种而获得的。它依靠高性能的从头RNA-seq组装和优化的翻译策略。在这里,探索了在组装前对Illumina RNA-seq读数进行的几种预处理,以将产生的重叠群转化为有用的多肽序列。使用了针对单个伽玛鲁窝淡水甲壳动物获得的实验转录组学和蛋白质组学数据集,最相关的程序是由分配给肽序列的MS / MS光谱之比定义的。去除平均质量得分低于17的读数(代表装配前150 bp读数的单个可能的核苷酸错误)会增加蛋白质组学结果。使用Transdecoder的最佳翻译是通过最小的50个氨基酸的开放阅读框长度以及系统地选择长度超过900个核苷酸的ORF来实现的。利用这些参数,由蛋白质组学提供的转录组组装和翻译为蛋白质组学的进一步改进铺平了道路。使用Transdecoder的最佳翻译是通过最小的50个氨基酸的开放阅读框长度以及系统地选择长度超过900个核苷酸的ORF来实现的。利用这些参数,由蛋白质组学提供的转录组组装和翻译为蛋白质组学的进一步改进铺平了道路。使用Transdecoder的最佳翻译是通过最小的50个氨基酸的开放阅读框长度以及超过900个核苷酸的ORF的系统选择实现的。利用这些参数,由蛋白质组学提供的转录组组装和翻译为蛋白质组学的进一步改进铺平了道路。
更新日期:2020-04-06
down
wechat
bug