当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessing genome assembly quality prior to downstream analysis: N50 versus BUSCO
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-02-24 , DOI: 10.1111/1755-0998.13364
April A Jauhal 1, 2 , Richard D Newcomb 2
Affiliation  

With the ever-increasing number of publicly available eukaryotic genome assemblies and user-friendly bioinformatics tools, there are increasing opportunities for researchers to use genomic resources in their research. While there are multiple dimensions to genome quality, it is often reduced to a single score that may not be correlated with other metrics, or appropriate for all applications of an assembly. To assess whether the commonly reported N50 value could reliably predict a separate dimension of genome quality, gene space completeness, we performed a meta-analysis of 611 published articles on eukaryotic genomes that used BUSCO scores, in addition to the typical N50 score. We found that although assemblies with relatively high contig and scaffold N50 values consistently had high BUSCO scores, a high BUSCO score could also be obtained from assemblies with a low N50. This reinforces that despite its ubiquity, N50 is not a perfect proxy for all measures of genome accuracy. Our data also suggests that variations in BUSCO scores among assemblies with poor N50 scores may be related to the number of introns in conserved eukaryotic genes. We stress the importance of screening and evaluating assembly quality based on the appropriate tools and urge increased reporting of additional genome assessment metrics in addition to N50. We also discuss the potential limitations of BUSCO and suggest improvements for assessing gene space within genome assemblies.

中文翻译:

在下游分析之前评估基因组组装质量:N50 与 BUSCO

随着公开可用的真核基因组组装和用户友好的生物信息学工具数量不断增加,研究人员在研究中使用基因组资源的机会越来越多。虽然基因组质量有多个维度,但它通常被简化为一个单一的分数,可能与其他指标不相关,或者适用于组装的所有应用。为了评估通常报告的 N50 值是否能够可靠地预测基因组质量、基因空间完整性的单独维度,我们对 611 篇使用 BUSCO 评分以及典型 N50 评分的真核基因组发表文章进行了荟萃分析。我们发现虽然具有相对较高的重叠群和支架 N50 值的组件始终具有较高的 BUSCO 分数,N50 低的组件也可以获得高 BUSCO 分数。这进一步表明,尽管 N50 无处不在,但它并不是所有基因组准确性指标的完美代表。我们的数据还表明,N50 分数较差的组件之间 BUSCO 分数的变化可能与保守的真核基因中的内含子数量有关。我们强调基于适当的工具筛选和评估组装质量的重要性,并敦促增加除 N50 之外的其他基因组评估指标的报告。我们还讨论了 BUSCO 的潜在局限性,并建议改进评估基因组组装中的基因空间。我们的数据还表明,N50 分数较差的组件之间 BUSCO 分数的变化可能与保守的真核基因中的内含子数量有关。我们强调基于适当的工具筛选和评估组装质量的重要性,并敦促增加除 N50 之外的其他基因组评估指标的报告。我们还讨论了 BUSCO 的潜在局限性,并建议改进评估基因组组装中的基因空间。我们的数据还表明,N50 分数较差的组件之间 BUSCO 分数的变化可能与保守的真核基因中的内含子数量有关。我们强调基于适当的工具筛选和评估组装质量的重要性,并敦促增加除 N50 之外的其他基因组评估指标的报告。我们还讨论了 BUSCO 的潜在局限性,并建议改进评估基因组组装中的基因空间。
更新日期:2021-02-24
down
wechat
bug