当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Family reunion via error correction: an efficient analysis of duplex sequencing data.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-03-04 , DOI: 10.1186/s12859-020-3419-8
Nicholas Stoler 1 , Barbara Arbeithuber 2 , Gundula Povysil 3, 4 , Monika Heinzl 3 , Renato Salazar 3 , Kateryna D Makova 2 , Irene Tiemann-Boege 3 , Anton Nekrutenko 1
Affiliation  

BACKGROUND Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.

中文翻译:

通过纠错实现家庭团聚:双链测序数据的有效分析。

背景双链测序是鉴定以极低频率存在的序列变体的最准确方法。它的强大之处在于将原始 DNA 分子两条链的多个后代汇集在一起​​,从而可以将真正的核苷酸替换与 PCR 扩增和测序伪影区分开来。这种策略的代价是对同一分子进行多次测序会增加动态范围,但会显着降低覆盖率,从而使全基因组双链测序成本过高。此外,每个双工实验都会产生很大一部分无法用于分析并被丢弃的单例读取。结果 在本文中,我们证明这些读数中有很大一部分在双工标签中包含 PCR 或测序错误。纠正此类错误允许将这些读数与其各自的家族“重新结合”,从而增加方法的输出并使其更具成本效益。结论 我们在新版本的双工分析软件 Du Novo 2.0 中将纠错策略与许多算法改进相结合。它是用 Python、C、AWK 和 Bash 编写的。它是开源的,可以通过 Galaxy、Bioconda 和 Github 轻松获得:https://github.com/galaxyproject/dunovo。
更新日期:2020-03-04
down
wechat
bug