当前位置: X-MOL 学术Front. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
Frontiers in Genetics ( IF 2.8 ) Pub Date : 2020-08-28 , DOI: 10.3389/fgene.2020.567757
Jaime Cuevas , Osval A. Montesinos-López , J. W. R. Martini , Paulino Pérez-Rodríguez , Morten Lillemo , Jose Crossa

The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m.



中文翻译:

包含主要效应和相互作用的大数据集的基于基因组的近似内核模型

分子标记和测序技术的飞速发展使得在动植物育种中使用基因组预测(GP)和选择(GS)成为可能。但是,当观察次数(ñ)很大(成千上万),处理这些大的基因组内核关系矩阵(求逆和分解)时的计算难度呈指数增长。当模型中包含基因组×环境相互作用和多性状内核时,此问题会加剧。在这项研究中,我们建议选择少量的线 < ñ),以构建比原始秩更低的秩的近似内核,从而以指数方式减少所需的计算时间。首先,我们描述具有协方差矩阵(内核)的单一环境的完整基因组方法(FGSE),其中包括所有ñ线。其次,我们选择对单个环境模型(APSE)进行划线并近似原始内核。同样,除了主要影响和G×E之外,我们还解释了具有基因型×环境模型(FGGE)的完整基因组方法,包括行,我们用G×E(APGE)近似核方法。我们将建议的方法应用于两个不同大小的不同小麦数据集(ñ),使用标准线性核基因组最佳线性无偏预测器(GBLUP),并使用特征值分解。在这两个数据集中,我们比较了FGSE和APSE的预测性能和计算时间;我们还比较了FGGE和APGE。结果显示了近似方法的竞争性预测性能,并且显着减少了计算时间。基因组预测的准确性取决于原始核的特征值的衰减(方差信息损失的数量)以及所选行的大小

更新日期:2020-10-16
down
wechat
bug