当前位置: X-MOL 学术Front. Ecol. Evolut. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Evidential Statistics of Genetic Assembly: Bootstrapping a Reference Sequence
Frontiers in Ecology and Evolution ( IF 2.4 ) Pub Date : 2021-07-01 , DOI: 10.3389/fevo.2021.614374
Yukihiko Toquenaga , Takuya Gagné

The reference sequences play an essential role in genome assembly, like type specimens in taxonomy. Those references are also samples obtained at some time and location with a specific method. How can we evaluate or discriminate uncertainties of the reference itself and assembly methods? Here we bootstrapped 50 random read data sets from a small circular genome of a {\it Escherichia coli} bacteriophage, phiX174, and tried to reconstruct the reference with 14 free assembly programs. Nine out of 14 assembly programs were capable of circular genome reconstruction. Unicycler correctly reconstructed the reference for 44 out of 50 data sets, but each reconstructed contig of the failed six data sets had minor defects. The other assembly software could reconstruct the reference with minor defects. The defect regions differed among the assembly programs, and the defect locations were far from randomly distributed in the reference genome. All contigs of Trinity included one, but Minia had two perfect copies other than an imperfect reference copy. The centroid of contigs for assembly programs except Unicycler differed from the reference with 75bases at most. Nonmetric multidimensional scaling (NMDS) plots of the centroids indicated that even the reference sequence was located slightly off from the estimated location of the true reference. We propose that the combination of bootstrapping a reference, making consensus contigs as centroids in an edit distance, and NMDS plotting will provide an evidential statistic way of genetic assembly for non-fragmented base sequences.

中文翻译:

遗传组装的证据统计:引导参考序列

参考序列在基因组组装中起着至关重要的作用,就像分类学中的类型标本一样。这些参考也是在某个时间和地点使用特定方法获得的样本。我们如何评估或区分参考本身和组装方法的不确定性?在这里,我们从 {\it 大肠杆菌} 噬菌体 phiX174 的小型环状基因组中引导了 50 个随机读取数据集,并尝试用 14 个免费组装程序重建参考。14 个组装程序中有 9 个能够进行循环基因组重建。Unicycler 正确重建了 50 个数据集中 44 个的参考,但失败的六个数据集中的每个重建重叠群都有轻微缺陷。其他装配软件可以重建具有轻微缺陷的参考。不同组装程序的缺陷区域不同,并且缺陷位置远非随机分布在参考基因组中。Trinity 的所有 contigs 都包含一个,但 Minia 有两个完美的副本,而不是一个不完美的参考副本。除 Unicycler 外,汇编程序的 contigs 的质心与参考不同,最多 75 个碱基。质心的非度量多维缩放 (NMDS) 图表明,即使是参考序列也与真实参考的估计位置略有不同。我们建议将自举参考、使一致重叠群作为编辑距离中的质心和 NMDS 绘图相结合,将为非片段化碱基序列的遗传组装提供证据统计方法。但是 Minia 有两个完美的副本,而不是一个不完美的参考副本。除 Unicycler 外,组装程序的 contigs 的质心与参考不同,最多 75 个碱基。质心的非度量多维缩放 (NMDS) 图表明,即使是参考序列也与真实参考的估计位置略有不同。我们建议将自举参考、使一致重叠群作为编辑距离中的质心和 NMDS 绘图相结合,将为非片段化碱基序列的遗传组装提供证据统计方法。但是 Minia 有两个完美的副本,而不是一个不完美的参考副本。除 Unicycler 外,组装程序的 contigs 的质心与参考不同,最多 75 个碱基。质心的非度量多维缩放 (NMDS) 图表明,即使是参考序列也与真实参考的估计位置略有不同。我们建议将自举参考、使一致重叠群作为编辑距离中的质心和 NMDS 绘图相结合,将为非片段化碱基序列的遗传组装提供证据统计方法。质心的非度量多维缩放 (NMDS) 图表明,即使是参考序列也与真实参考的估计位置略有不同。我们建议将自举参考、使一致重叠群作为编辑距离中的质心和 NMDS 绘图相结合,将为非片段化碱基序列的遗传组装提供证据统计方法。质心的非度量多维缩放 (NMDS) 图表明,即使是参考序列也与真实参考的估计位置略有不同。我们建议将自举参考、使一致重叠群作为编辑距离中的质心和 NMDS 绘图相结合,将为非片段化碱基序列的遗传组装提供证据统计方法。
更新日期:2021-07-01
down
wechat
bug