当前位置: X-MOL 学术Genet. Sel. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
Genetics Selection Evolution ( IF 3.6 ) Pub Date : 2020-04-06 , DOI: 10.1186/s12711-020-00536-8
Roger Ros-Freixedes 1, 2 , Andrew Whalen 1 , Ching-Yi Chen 3 , Gregor Gorjanc 1 , William O Herring 3 , Alan J Mileham 4 , John M Hickey 1
Affiliation  

The coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper, we describe and validate the coupling of a sequencing strategy with the imputation method hybrid peeling in real animal breeding settings. We used data from four pig populations of different size (18,349 to 107,815 individuals) that were widely genotyped at densities between 15,000 and 75,000 markers genome-wide. Around 2% of the individuals in each population were sequenced (most of them at 1× or 2× and 37–92 individuals per population, totalling 284, at 15–30×). We imputed whole-genome sequence data with hybrid peeling. We evaluated the imputation accuracy by removing the sequence data of the 284 individuals with high coverage, using a leave-one-out design. We simulated data that mimicked the sequencing strategy used in the real populations to quantify the factors that affected the individual-wise and variant-wise imputation accuracies using regression trees. Imputation accuracy was high for the majority of individuals in all four populations (median individual-wise dosage correlation: 0.97). Imputation accuracy was lower for individuals in the earliest generations of each population than for the rest, due to the lack of marker array data for themselves and their ancestors. The main factors that determined the individual-wise imputation accuracy were the genotyping status, the availability of marker array data for immediate ancestors, and the degree of connectedness to the rest of the population, but sequencing coverage of the relatives had no effect. The main factors that determined variant-wise imputation accuracy were the minor allele frequency and the number of individuals with sequencing coverage at each variant site. Results were validated with the empirical observations. We demonstrate that the coupling of an appropriate sequencing strategy and hybrid peeling is a powerful strategy for generating whole-genome sequence data with high accuracy in large pedigreed populations where only a small fraction of individuals (2%) had been sequenced, mostly at low coverage. This is a critical step for the successful implementation of whole-genome sequence data for genomic prediction and fine-mapping of causal variants.

中文翻译:


使用混合去皮对大型纯种牲畜群体进行全基因组序列插补的准确性



适当的测序策略和插补方法的结合对于从牲畜种群中组装大型全基因组序列数据集以进行研究和育种至关重要。在本文中,我们描述并验证了测序策略与真实动物育种环境中插补方法混合剥皮的耦合。我们使用了来自四个不同规模猪群(18,349 至 107,815 头)的数据,这些猪群在全基因组范围内以 15,000 至 75,000 个标记的密度进行了广泛的基因分型。每个种群中大约 2% 的个体进行了测序(大多数为 1× 或 2×,每个种群 37-92 个个体,总共 284 个,为 15-30×)。我们通过混合剥离来估算全基因组序列数据。我们使用留一法设计,通过删除 284 个高覆盖率个体的序列数据来评估插补准确性。我们模拟了模仿真实人群中使用的测序策略的数据,以使用回归树量化影响个体和变量插补精度的因素。对于所有四个人群中的大多数个体来说,插补准确性都很高(中位个体剂量相关性:0.97)。由于缺乏他们自己及其祖先的标记阵列数据,每个群体最早几代的个体的插补准确性低于其他个体。决定个体估算准确性的主要因素是基因分型状态、直系祖先标记阵列数据的可用性以及与其他人群的关联程度,但亲属的测序覆盖率没有影响。 决定变异插补准确性的主要因素是次要等位基因频率和每个变异位点具有测序覆盖率的个体数量。结果通过经验观察得到验证。我们证明,适当的测序策略和混合剥离的结合是一种强大的策略,可以在大型纯系群体中生成高精度的全基因组序列数据,其中只有一小部分个体(2%)已被测序,而且大多覆盖率较低。这是成功实施全基因组序列数据以进行基因组预测和因果变异精细绘图的关键一步。
更新日期:2020-04-22
down
wechat
bug