当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistical Mitogenome Assembly with RepeaTs.
Journal of Computational Biology ( IF 1.4 ) Pub Date : 2020-09-04 , DOI: 10.1089/cmb.2019.0505
Fahad Alqahtani 1, 2 , Ion I Măndoiu 1
Affiliation  

By using next-generation sequencing technologies, it is possible to quickly and inexpensively generate large numbers of relatively short reads from both the nuclear and mitochondrial DNA (mtDNA) contained in a biological sample. Unfortunately, assembling such whole-genome sequencing (WGS) data with standard de novo assemblers often fails to generate high-quality mitochondrial genome sequences due to the large difference in copy number (and hence sequencing depth) between the mitochondrial and nuclear genomes. Assembly of complete mitochondrial genome sequences is further complicated by the fact that many de novo assemblers are not designed for circular genomes and by the presence of repeats in the mitochondrial genomes of some species. In this article, we describe the Statistical Mitogenome Assembly with RepeaTs (SMART) pipeline for automated assembly of mitochondrial genomes from WGS data. SMART uses an efficient coverage-based filter to first select a subset of reads enriched in mtDNA sequences. Contigs produced by an initial assembly step are filtered using the Basic Local Alignment Search Tool searches against a comprehensive mitochondrial genome database and are used as “baits” for an alignment-based filter that produces the set of reads used in a second de novo assembly and scaffolding step. In the presence of repeats, the possible paths through the assembly graph are evaluated using a maximum likelihood model. Additionally, the assembly process is repeated for a user-specified number of times on resampled subsets of reads to select for annotation of the reconstructed sequences with highest bootstrap support. Experiments on WGS data sets from a variety of species show that the SMART pipeline produces complete circular mitochondrial genome sequences with a higher success rate than current state-of-the-art tools, particularly for low-coverage WGS data sets.

中文翻译:

带有重复的统计有丝分裂基因组组装。

通过使用新一代测序技术,可以从生物样本中包含的核 DNA (mtDNA) 和线粒体 DNA (mtDNA) 中快速、廉价地生成大量相对较短的读数。不幸的是,由于线粒体和核基因组之间拷贝数(以及测序深度)的巨大差异,使用标准从头组装器组装此类全基因组测序 (WGS) 数据通常无法生成高质量的线粒体基因组序列。由于许多从头组装器不是为环状基因组设计的,并且某些物种的线粒体基因组中存在重复序列,因此完整线粒体基因组序列的组装变得更加复杂。在本文中,我们描述了用于从 WGS 数据自动组装线粒体基因组的带有重复的统计线粒体基因组组装 (SMART) 管道。SMART 使用高效的基于覆盖的过滤器来首先选择富含 mtDNA 序列的读取子集。初始组装步骤产生的重叠群使用基本局部比对搜索工具在综合线粒体基因组数据库中搜索进行过滤,并用作基于比对的过滤器的“诱饵”,该过滤器产生用于第二次从头组装和脚手架步骤。在存在重复的情况下,使用最大似然模型评估通过装配图的可能路径。此外,组装过程在重新采样的读取子集上重复用户指定的次数,以选择具有最高引导支持的重建序列的注释。对来自各种物种的 WGS 数据集的实验表明,SMART 管道产生完整的环状线粒体基因组序列,成功率高于当前最先进的工具,尤其是对于低覆盖率的 WGS 数据集。
更新日期:2020-09-14
down
wechat
bug