当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
bioRxiv - Genomics Pub Date : 2021-01-17 , DOI: 10.1101/2021.01.15.426838
Audald Lloret-Villas , Meenu Bhati , Naveen Kumar Kadri , Ruedi Fries , Hubert Pausch

Background: Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA Angus 1). Results: Read mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA Angus 1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA Angus 1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA Angus 1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA Angus 1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes. The accuracy of imputation (Beagle R2) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies. Conclusions: The ARS-UCD1.2 and UOA Angus 1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species.

中文翻译:

调查参考装配选择对牛品种的基因组分析的影响

背景:参考引导的阅读比对和变异基因分型容易引起参考等位基因偏倚,特别是对于与参考基因组差异很大的样品。基于赫里福德的装配体是广为接受的牛参考基因组。已经为不同品种的牛装配了单倍型分辨的基因组,其质量和连续性超过了当前的牛参考基因组。使用161头棕色瑞士牛的全基因组测序数据,我们比较了牛参照基因组(ARS-UCD1.2)和高度连续的基于安格斯的装配体(UOA)之间的读图和序列变异基因分型以及下游基因组分析的准确性。安格斯1)。结果:ARS-UCD1.2和UOA Angus 1组件之间的读取映射准确性没有显着差异。我们发现22,744,517和22,559,分别来自ARS-UCD1.2和UOA Angus 1的675个高质量变体。序列和阵列的基因型之间的一致性很高,并且在两个装配体的分离位点上偏离Hardy-Weinberg比例的变体数量都很少。与ARS-UCD1.2比对相比,从UOA Angus 1进行基因型识别的人工假体更多。使用复合似然比检验,我们分别从ARS-UCD1.2和UOA Angus 1中检测到40个和33个选择标记,但是两个程序集之间的重叠率很低。使用161个测序的Brown Swiss牛作为参照,我们将序列变异基因型推算为30499头具有微阵列衍生基因型的牛的作图队列。两种组件的插补精度(Beagle R2)都很高(0.87)。推算的序列变异基因型与六个乳品性状和身高之间的全基因组关联研究从这两个程序集中获得了几乎相同的结果。结论:ARS-UCD1.2和UOA Angus 1装配体适用于棕色瑞士牛的参考指导基因组分析。尽管两个程序集之间的读图定位和基因分型准确度的差异可以忽略不计,但参考基因组的选择对使用复合似然比检验检测选择标记具有重大影响。我们开发了一种工作流程,该工作流程可以进行调整并重复使用,以比较参考基因组对各种品种,种群和物种的基因组分析的影响。2和UOA Angus 1装配体适用于棕色瑞士牛的参考引导基因组分析。尽管两个程序集之间的读图定位和基因分型准确度的差异可以忽略不计,但参考基因组的选择对使用复合似然比检验检测选择标记具有重大影响。我们开发了一种工作流程,该工作流程可以进行调整并重复使用,以比较参考基因组对各种品种,种群和物种的基因组分析的影响。2和UOA Angus 1装配体适用于棕色瑞士牛的参考引导基因组分析。尽管两个程序集之间的读图定位和基因分型准确度的差异可以忽略不计,但参考基因组的选择对使用复合似然比检验检测选择标记具有重大影响。我们开发了一种工作流程,该工作流程可以进行调整并重复使用,以比较参考基因组对各种品种,种群和物种的基因组分析的影响。
更新日期:2021-01-18
down
wechat
bug