当前位置: X-MOL 学术Ann. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Thousands of missing variants in the UK Biobank are recoverable by genome realignment
Annals of Human Genetics ( IF 1.0 ) Pub Date : 2020-03-31 , DOI: 10.1111/ahg.12383
Tongqiu Jia 1 , Brenton Munson 1 , Hana Lango Allen 2 , Trey Ideker 1 , Amit R Majithia 1
Affiliation  

The UK Biobank is an unprecedented resource for human disease research. In March 2019, 49,997 exomes were made publicly available to investigators. Here we note that thousands of variant calls are unexpectedly absent from this dataset, with 641 genes showing zero variation. We show that the reason for this was an erroneous read alignment to the GRCh38 reference. The missing variants can be recovered by modifying read alignment parameters to correctly handle the expanded set of contigs available in the human genome reference. Given the size and complexity of such population scale datasets, we propose a simple heuristic that can uncover systematic errors using summary data accessible to most investigators.

中文翻译:


英国生物银行中数千个缺失的变异可通过基因组重组来恢复



英国生物银行是人类疾病研究前所未有的资源。 2019 年 3 月,研究人员公开了 49,997 个外显子组。在这里,我们注意到该数据集中意外地缺少数千个变异调用,其中 641 个基因显示零变异。我们表明,造成这种情况的原因是与 GRCh38 引用的读取对齐错误。可以通过修改读取比对参数来恢复缺失的变体,以正确处理人类基因组参考中可用的扩展重叠群集。考虑到此类人口规模数据集的规模和复杂性,我们提出了一种简单的启发式方法,可以使用大多数研究人员可访问的汇总数据来发现系统错误。
更新日期:2020-03-31
down
wechat
bug