当前位置: X-MOL 学术Nat. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A diploid assembly-based benchmark for variants in the major histocompatibility complex.
Nature Communications ( IF 14.7 ) Pub Date : 2020-09-22 , DOI: 10.1038/s41467-020-18564-9
Chen-Shan Chin 1 , Justin Wagner 2 , Qiandong Zeng 3 , Erik Garrison 4 , Shilpa Garg 5 , Arkarachai Fungtammasan 1 , Mikko Rautiainen 6, 7, 8 , Sergey Aganezov 9 , Melanie Kirsche 9 , Samantha Zarate 9 , Michael C Schatz 9, 10 , Chunlin Xiao 11 , William J Rowell 12 , Charles Markello 4 , Jesse Farek 13 , Fritz J Sedlazeck 13 , Vikas Bansal 14 , Byunggil Yoo 15 , Neil Miller 15 , Xin Zhou 16 , Andrew Carroll 17 , Alvaro Martinez Barrio 18 , Marc Salit 19 , Tobias Marschall 20 , Alexander T Dilthey 21 , Justin M Zook 2
Affiliation  

Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.



中文翻译:


主要组织相容性复合体中变体的基于二倍体组装的基准。



大多数人类基因组的特点是将单个读数与参考基因组进行比对,但准确的长读数和连锁读数现在使我们能够构建准确的、分阶段的从头组装。我们关注医学上重要的、高度可变的 500 万碱基对 (bp) 区域,其中二倍体组装特别有用 - 主要组织相容性复合体 (MHC)。在这里,我们开发了一个人类基因组基准,该基准源自公开同意的瓶中基因组样本 HG002 的二倍体组装。我们为每个单倍型组装一个重叠群,将它们与参考比对,调用阶段性小变异和结构变异,并为 MHC 定义小变异基准,覆盖 94% 的 MHC 和 22368 个小于 50 bp 的变异,多出 49% 的变异比基于映射的基准。该基准测试可靠地识别基于映射的调用集中的错误,并能够在比以前基准测试覆盖的区域更密集、更复杂的变化区域中进行性能评估。

更新日期:2020-09-22
down
wechat
bug