当前位置: X-MOL 学术Genet. Sel. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
Genetics Selection Evolution ( IF 4.1 ) Pub Date : 2022-05-20 , DOI: 10.1186/s12711-022-00726-6
Rostam Abdollahi-Arpanahi , Daniela Lourenco , Ignacy Misztal

The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals. The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records. When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41−0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV. For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary.

中文翻译:

单步GBLUP的成熟算法和年轻算法中核心组的大小和定义的综合研究

已建议使用已证明和年轻 (APY) 算法作为递归计算大型基因组关系矩阵 (G) 逆的稀疏表示的解决方案。在 APY 中,基因分型个体的子集用作核心,其余基因分型个体用作非核心。核心的大小和定义是应用 APY 的相关研究课题,特别是考虑到基因分型个体数量的不断增加。本研究的目的是调查几个核心定义,包括最受欢迎的动物 (MPA)(即对遗传库有高贡献的动物)、最不受欢迎的雄性 (LPM)、最不受欢迎的雌性 (LPF)、随机集 (Rnd),动物均匀分布在家谱路径 (Ped)、无关个体 (Unrel) 或基于家庭内部选择 (Fam),或基因内容矩阵(QR)的分解。基于使用 APY 的单步基因组最佳线性无偏预测 (ssGBLUP) 的预测准确性,针对六个核心大小评估每个定义。使用 G 的完全逆的 ssGBLUP 的预测精度作为基线。该数据集由 35.7 万头纯种杜洛克猪组成,其中 11.1 万头猪具有基因型和约 22 万条表型记录。当核心尺寸等于最大特征值的数量时,解释了 G (n = 160) 变化的 50%,MPA 和 Ped 核心定义提供了最高的平均预测精度 (~ 0.41-0.53)。随着核心尺寸增加到特征值的数量,解释了 G (n = 7320) 中 99% 的变化,对于所有核心类型,预测准确性几乎相同,并且与来自 ssGBLUP 的基因组估计育种值 (GEBV) 的相关性对于所有核心定义都大于 0.99。代表所有代的核心,如 Rnd、Ped、Fam 和 Unrel,在 GEBV 的层次聚类中分组在一起。对于较小的核心尺寸,核心的定义很重要;然而,随着核心的大小达到一个最佳值,该值等于解释 G 99% 变化的最大特征值的数量,核心的定义变得任意。
更新日期:2022-05-22
down
wechat
bug