当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
2-kupl: mapping-free variant detection from DNA-seq data of matched samples
bioRxiv - Bioinformatics Pub Date : 2021-01-19 , DOI: 10.1101/2021.01.17.427048
Yunfeng Wang , Haoliang Xue , Christine Pourcel , Yang Du , Daniel Gautheret

The detection of genome variants, including point mutations, indels and structural variants, is a fundamental and challenging computational problem. We address here the problem of variant detection between two deep-sequencing (DNA-seq) samples, such as two human samples from an individual patient, or two samples from distinct bacterial strains. The preferred strategy in such a case is to align each sample to a common reference genome, collect all variants and compare these variants between samples. Such mapping-based protocols have several limitations. DNA sequences with large indels, aggregated mutations and structural variants are hard to map to the reference. Furthermore, DNA sequences cannot be mapped reliably to genomic low complexity regions and repeats. Herein, we introduce 2-kupl, a k-mer based, mapping-free protocol to detect variants between two DNA-seq samples. On simulated and actual data, 2-kupl achieves a higher precision than other mapping-free protocols. Applying 2-kupl to prostate cancer whole exome data, we identify a number of candidate variants in hard-to-map regions and propose potential novel recurrent variants in this disease.

中文翻译:

2-kupl:从匹配样品的DNA-seq数据中进行无图谱变异检测

基因组变异的检测,包括点突变,插入缺失和结构变异,是一个基本且具有挑战性的计算问题。我们在这里解决了两个深度测序(DNA-seq)样本之间的变异检测问题,例如两个来自单个患者的人类样本,或者两个来自不同细菌菌株的样本。在这种情况下,首选策略是将每个样品与一个共同的参考基因组比对,收集所有变体并比较样品之间的这些变体。这种基于映射的协议具有多个限制。具有大插入缺失,聚集突变和结构变异的DNA序列很难映射到参考。此外,DNA序列不能可靠地定位到基因组低复杂性区域和重复序列。在此,我们介绍基于k-mer的2-kupl,免定位方案可检测两个DNA-seq样品之间的变体。在模拟和实际数据上,2-kupl的精度要高于其他无映射协议。将2-kupl应用于前列腺癌整个外显子组数据,我们在难以定位的区域中鉴定了许多候选变异,并提出了该疾病的潜在新型复发变异。
更新日期:2021-01-20
down
wechat
bug