当前位置: X-MOL 学术Genet. Sel. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimisation of the core subset for the APY approximation of genomic relationships
Genetics Selection Evolution ( IF 3.6 ) Pub Date : 2022-11-22 , DOI: 10.1186/s12711-022-00767-x
Ivan Pocrnic 1 , Finn Lindgren 2 , Daniel Tolhurst 1 , William O Herring 3 , Gregor Gorjanc 1
Affiliation  

By entering the era of mega-scale genomics, we are facing many computational issues with standard genomic evaluation models due to their dense data structure and cubic computational complexity. Several scalable approaches have been proposed to address this challenge, such as the Algorithm for Proven and Young (APY). In APY, genotyped animals are partitioned into core and non-core subsets, which induces a sparser inverse of the genomic relationship matrix. This partitioning is often done at random. While APY is a good approximation of the full model, random partitioning can make results unstable, possibly affecting accuracy or even reranking animals. Here we present a stable optimisation of the core subset by choosing animals with the most informative genotype data. We derived a novel algorithm for optimising the core subset based on a conditional genomic relationship matrix or a conditional single nucleotide polymorphism (SNP) genotype matrix. We compared the accuracy of genomic predictions with different core subsets for simulated and real pig data sets. The core subsets were constructed (1) at random, (2) based on the diagonal of the genomic relationship matrix, (3) at random with weights from (2), or (4) based on the novel conditional algorithm. To understand the different core subset constructions, we visualise the population structure of the genotyped animals with linear Principal Component Analysis and non-linear Uniform Manifold Approximation and Projection. All core subset constructions performed equally well when the number of core animals captured most of the variation in the genomic relationships, both in simulated and real data sets. When the number of core animals was not sufficiently large, there was substantial variability in the results with the random construction but no variability with the conditional construction. Visualisation of the population structure and chosen core animals showed that the conditional construction spreads core animals across the whole domain of genotyped animals in a repeatable manner. Our results confirm that the size of the core subset in APY is critical. Furthermore, the results show that the core subset can be optimised with the conditional algorithm that achieves an optimal and repeatable spread of core animals across the domain of genotyped animals.

中文翻译:

基因组关系 APY 近似的核心子集优化

通过进入大规模基因组学时代,由于其密集的数据结构和立方计算复杂性,我们面临着标准基因组评估模型的许多计算问题。已经提出了几种可扩展的方法来应对这一挑战,例如 Proven and Young 算法 (APY)。在 APY 中,基因分型动物被分为核心和非核心子集,这会导致基因组关系矩阵的逆矩阵更稀疏。这种划分通常是随机进行的。虽然 APY 是完整模型的一个很好的近似值,但随机分区会使结果不稳定,可能会影响准确性甚至重新排列动物。在这里,我们通过选择具有最多信息基因型数据的动物来呈现核心子集的稳定优化。我们推导出了一种基于条件基因组关系矩阵或条件单核苷酸多态性 (SNP) 基因型矩阵优化核心子集的新算法。我们比较了模拟和真实猪数据集的不同核心子集的基因组预测准确性。核心子集是 (1) 随机构建的,(2) 基于基因组关系矩阵的对角线构建的,(3) 随机构建的,权重来自 (2),或 (4) 基于新的条件算法。为了理解不同的核心子集结构,我们使用线性主成分分析和非线性均匀流形近似和投影来可视化基因分型动物的种群结构。当核心动物的数量捕获基因组关系中的大部分变异时,所有核心子集构建都表现得同样好,在模拟和真实数据集中。当核心动物的数量不够大时,随机构建的结果存在很大差异,但条件构建的结果没有差异。种群结构和所选核心动物的可视化表明,条件构建以可重复的方式将核心动物传播到基因分型动物的整个领域。我们的结果证实 APY 中核心子集的大小至关重要。此外,结果表明,可以使用条件算法优化核心子集,从而实现核心动物在基因分型动物领域的最佳和可重复传播。随机构建的结果存在很大差异,但条件构建的结果没有差异。种群结构和所选核心动物的可视化表明,条件构建以可重复的方式将核心动物传播到基因分型动物的整个领域。我们的结果证实 APY 中核心子集的大小至关重要。此外,结果表明,可以使用条件算法优化核心子集,从而实现核心动物在基因分型动物领域的最佳和可重复传播。随机构建的结果存在很大差异,但条件构建的结果没有差异。种群结构和所选核心动物的可视化表明,条件构建以可重复的方式将核心动物传播到基因分型动物的整个领域。我们的结果证实 APY 中核心子集的大小至关重要。此外,结果表明,可以使用条件算法优化核心子集,从而实现核心动物在基因分型动物领域的最佳和可重复传播。种群结构和所选核心动物的可视化表明,条件构建以可重复的方式将核心动物传播到基因分型动物的整个领域。我们的结果证实 APY 中核心子集的大小至关重要。此外,结果表明,可以使用条件算法优化核心子集,从而实现核心动物在基因分型动物领域的最佳和可重复传播。种群结构和所选核心动物的可视化表明,条件构建以可重复的方式将核心动物传播到基因分型动物的整个领域。我们的结果证实 APY 中核心子集的大小至关重要。此外,结果表明,可以使用条件算法优化核心子集,从而实现核心动物在基因分型动物领域的最佳和可重复传播。
更新日期:2022-11-23
down
wechat
bug