当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Jointly aligning a group of DNA reads improves accuracy of identifying large deletions
Nucleic Acids Research ( IF 16.6 ) Pub Date : 2017-11-22 , DOI: 10.1093/nar/gkx1175
Anish M S Shrestha 1 , Martin C Frith 1, 2, 3 , Kiyoshi Asai 1, 2 , Hugues Richard 4
Affiliation  

Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls—with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (≥20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at https://bitbucket.org/jointreadalignment/jra-src.

中文翻译:


联合比对一组 DNA 读数可提高识别大缺失的准确性



进行序列比对以从基因组测序数据中识别结构变异(例如大缺失)是一项基本任务,但目前的方法还远远不够完美。目前的做法是将每个 DNA 读数独立比对到参考基因组。我们表明,基因组重排在重复丰富的区域中积累的倾向使这些比对产生了严重的模糊性,从而影响了变异调用——以当前的读长长度,这影响了 C. Venter 基因组中超过三分之一的已知大缺失。我们提出了一种将读数联合比对到基因组的方法,其中一个读数的比对歧义可以通过其他读数来消除歧义。我们表明,这可以显着提高识别大缺失(≥20 个碱基)的准确性,同时将计算开销降至最低,并保持与当前工具相同的总体运行时间。软件实现可作为名为 JRA 的开源 Python 程序获得,网址为 https://bitbucket.org/jointreadalignment/jra-src。
更新日期:2017-11-22
down
wechat
bug