当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A decade of de novo transcriptome assembly: Are we there yet?
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2020-10-08 , DOI: 10.1111/1755-0998.13268
Martin Hölzer 1, 2, 3
Affiliation  

A decade ago, de novo transcriptome assembly evolved as a versatile and powerful approach to make evolutionary assumptions, analyse gene expression, and annotate novel transcripts, in particular, for non‐model organisms lacking an appropriate reference genome. Various tools have been developed to generate a transcriptome assembly, and even more computational methods depend on the results of these tools for further downstream analyses. In this issue of Molecular Ecology Resources, Freedman et al. (Mol Ecol Resourc 2020) present a comprehensive analysis of errors in de novo transcriptome assemblies across public data sets and different assembly methods. They focus on two implicit assumptions that are often violated: First, the assembly presents an unbiased view of the transcriptome. Second, the expression estimates derived from the assembly are reasonable, albeit noisy, approximations of the relative frequency of expressed transcripts. They show that appropriate filtering can reduce this bias but can also lead to the loss of a reasonable number of highly expressed transcripts. Thus, to partly alleviate the noise in expression estimates, they propose a new normalization method called length‐rescaled CPM. Remarkably, the authors found considerable distortions at the nucleotide level, which leads to an underestimation of diversity in transcriptome assemblies. The study by Freedman et al. (Mol Ecol Resourc 2020) clearly shows that we have not yet reached “high‐quality” in the field of transcriptome assembly. Above all, it helps researchers be aware of these problems and filter and interpret their transcriptome assembly data appropriately and with caution.

中文翻译:

十年的从头转录组组装:我们到了吗?

十年前,从头转录组组装发展成为一种通用且强大的方法,可用于进行进化假设、分析基因表达和注释新的转录本,特别是对于缺乏适当参考基因组的非模式生物。已经开发了各种工具来生成转录组组装,甚至更多的计算方法取决于这些工具的结果,以进行进一步的下游分析。在本期分子生态资源,弗里德曼等人。(Mol Ecol Resourc 2020) 对跨公共数据集和不同组装方法的从头转录组组装中的错误进行了全面分析。他们关注两个经常被违反的隐含假设:首先,程序集呈现了对转录组的公正看法。其次,从组装中得出的表达估计是合理的,尽管有噪音,但表达的转录本的相对频率的近似值。他们表明,适当的过滤可以减少这种偏差,但也会导致合理数量的高表达转录本的丢失。因此,为了部分减轻表达式估计中的噪声,他们提出了一种新的归一化方法,称为长度重新缩放的 CPM。值得注意的是,作者在核苷酸水平上发现了相当大的扭曲,这导致低估了转录组组装的多样性。Freedman 等人的研究。(Mol Ecol Resourc 2020)清楚地表明,我们在转录组组装领域还没有达到“高质量”。最重要的是,它可以帮助研究人员意识到这些问题,并适当谨慎地过滤和解释他们的转录组组装数据。
更新日期:2020-12-17
down
wechat
bug