当前位置: X-MOL 学术Genet. Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The impact of post-alignment processing procedures on whole-exome sequencing data
Genetics and Molecular Biology ( IF 1.7 ) Pub Date : 2020-01-01 , DOI: 10.1590/1678-4685-gmb-2020-0047
Murilo Guimarães Borges 1, 2, 3 , Helena Tadiello de Moraes 1, 2 , Cristiane de Souza Rocha 1, 2 , Iscia Lopes-Cendes 1, 2
Affiliation  

The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelines can minimize mismatches. They allow reads to be correctly realigned and recalibrated, resulting in more parsimonious variant calling. In this work, we aim to investigate the impact of using different sets of common variants as references to facilitate variant calling from whole-exome sequencing data. We selected reference variants from common insertions and deletions available within the 1K Genomes project data and from databases from the Latin American Database of Genetic Variation (LatinGen). We used the Genome Analysis Toolkit to perform post-processing procedures like local realignment, quality recalibration procedures, and variant calling in whole exome samples. We identified an increased number of variants from the call set for all groups when no post-processing procedure was performed. We found that there was a higher concordance rate between variants called using 1K Genomes and LatinGen. Therefore, we believe that the increased number of rare variants identified in the analysis without realignment or quality recalibration indicated that they were likely false-positives.

中文翻译:


比对后处理程序对全外显子组测序数据的影响



有人建议使用比对后程序来防止在大量 DNA 测序数据中识别出假阳性。插入和删除最有可能被变体调用算法误解。使用已知的遗传变异作为后处理流程的参考可以最大限度地减少不匹配。它们允许正确地重新对齐和重新校准读数,从而导致更简洁的变体调用。在这项工作中,我们的目标是研究使用不同组常见变异作为参考以促进全外显子组测序数据的变异调用的影响。我们从 1K 基因组项目数据中可用的常见插入和删除以及拉丁美洲遗传变异数据库 (LatinGen) 的数据库中选择了参考变异。我们使用基因组分析工具包执行后处理程序,例如整个外显子组样本中的局部重新比对、质量重新校准程序和变异调用。当不执行后处理程序时,我们从所有组的调用集中发现了更多数量的变体。我们发现使用 1K Genomes 和 LatinGen 的变体之间有更高的一致性。因此,我们认为,在没有重新比对或质量重新校准的情况下,分析中发现的罕见变异数量的增加表明它们可能是假阳性。
更新日期:2020-01-01
down
wechat
bug