当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LDscaff: LD-based scaffolding of de novo genome assemblies
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-12-28 , DOI: 10.1186/s12859-020-03895-7
Zicheng Zhao , Yingxiao Zhou , Shuai Wang , Xiuqing Zhang , Changfa Wang , Shuaicheng Li

Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.

中文翻译:

LDscaff:从头开始的基因组组装的基于LD的支架

基因组组装是从头进行基因组分析的基础。利用各种测序技术的混合装配可提高连续性和准确性。尽管这样的方法需要额外的昂贵测序工作,但所提供的信息还没有充分利用数百万个存在的全基因组测序数据来解决支架的任务。群体数据中的遗传重组模式表明不同位点的等位基因之间存在非随机关联,可以提供物理距离信号来指导脚手架。在本文中,我们提出了LDscaff用于基因组装配草案,该草案在种群数据中纳入了连锁不平衡信息。我们使用模拟数据和实际数据评估了我们方法的性能。我们通过分裂猪参考基因组并重新组装来模拟支架。脚手架之间的间隙在0到100 KB之间。没有间隙时,基因组错配率为2.43%。然后,我们实施了我们的方法,以完善由NGS数据完全组装的大熊猫基因组和驴基因组。经过LDscaff处理后,所得的熊猫组件的支架N50为3.6 MB,比原始N50(1.3 MB)大2.5倍。重新组装的驴组件的N50长度从23.8 MB改进为32.1 MB。我们的方法使用现有的重新排序数据有效地改进了汇编程序,并且是收集新数据所需的现有汇编程序的潜在替代方法。它们完全由NGS数据组装而成。经过LDscaff处理后,所得的熊猫组件的支架N50为3.6 MB,比原始N50(1.3 MB)大2.5倍。重新组装的驴组件的N50长度从23.8 MB改进为32.1 MB。我们的方法使用现有的重新排序数据有效地改进了汇编程序,并且是收集新数据所需的现有汇编程序的潜在替代方法。它们完全由NGS数据组装而成。经过LDscaff处理后,所得的熊猫组件的支架N50为3.6 MB,比原始N50(1.3 MB)大2.5倍。重新组装的驴组件的N50长度从23.8 MB改进为32.1 MB。我们的方法使用现有的重新排序数据有效地改进了汇编程序,并且是收集新数据所需的现有汇编程序的潜在替代方法。
更新日期:2020-12-28
down
wechat
bug