当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genomic loci susceptible to systematic sequencing bias in clinical whole genomes.
Genome Research ( IF 7 ) Pub Date : 2020-03-01 , DOI: 10.1101/gr.255349.119
Timothy M Freeman 1 , , Dennis Wang 1, 2, 3 , Jason Harris 4
Affiliation  

Accurate massively parallel sequencing (MPS) of genetic variants is key to many areas of science and medicine, such as cataloging population genetic variation and diagnosing genetic diseases. Certain genomic positions can be prone to higher rates of systematic sequencing and alignment bias that limit accuracy, resulting in false positive variant calls. Current standard practices to differentiate between loci that can and cannot be sequenced with high confidence utilize consensus between different sequencing methods as a proxy for sequencing confidence. These practices have significant limitations, and alternative methods are required to overcome them. We have developed a novel statistical method based on summarizing sequenced reads from whole-genome clinical samples and cataloging them in "Incremental Databases" that maintain individual confidentiality. Allele statistics were cataloged for each genomic position that consistently showed systematic biases with the corresponding MPS sequencing pipeline. We found systematic biases present at ∼1%-3% of the human autosomal genome across five patient cohorts. We identified which genomic regions were more or less prone to systematic biases, including large homopolymer flanks (odds ratio = 23.29-33.69) and the NIST high confidence genomic regions (odds ratio = 0.154-0.191). We confirmed our predictions on a gold-standard reference genome and showed that these systematic biases can lead to suspect variant calls within clinical panels. Our results recommend increased caution to address systematic biases in whole-genome sequencing and alignment. This study provides the implementation of a simple statistical approach to enhance quality control of clinically sequenced samples by flagging variants at suspect loci for further analysis or exclusion.

中文翻译:

易受临床全基因组系统测序偏倚影响的基因组位点。

遗传变异的准确大规模并行测序 (MPS) 是许多科学和医学领域的关键,例如对群体遗传变异进行编目和诊断遗传疾病。某些基因组位置可能容易出现更高的系统测序率和比对偏差,从而限制准确性,从而导致假阳性变异调用。当前区分可以和不能以高置信度测序的基因座的标准实践利用不同测序方法之间的共识作为测序置信度的代理。这些做法有很大的局限性,需要替代方法来克服它们。我们开发了一种新的统计方法,该方法基于从全基因组临床样本中汇总测序读数并将其分类到“增量数据库”中 维护个人机密。对每个基因组位置的等位基因统计进行编目,这些位置始终显示出与相应 MPS 测序管道的系统偏差。我们发现,在五个患者队列中,人类常染色体基因组的 1%-3% 存在系统性偏差。我们确定了哪些基因组区域或多或少容易出现系统偏差,包括大的均聚物侧翼(优势比 = 23.29-33.69)和 NIST 高置信基因组区域(优势比 = 0.154-0.191)。我们证实了我们对黄金标准参考基因组的预测,并表明这些系统偏差可能导致临床小组中的可疑变异调用。我们的结果建议更加谨慎地解决全基因组测序和比对中的系统偏差。
更新日期:2020-03-01
down
wechat
bug