当前位置: X-MOL 学术Tree Genet. Genomes › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Entropy and mutual information in genome-wide selection: the splitting of k-fold cross-validation sets and implications for tree breeding
Tree Genetics & Genomes ( IF 1.9 ) Pub Date : 2020-03-28 , DOI: 10.1007/s11295-020-01430-6
Guilherme Ferreira Simiqueli , Marcos Deon Vilela de Resende

Random k-fold cross-validation in genome-wide selection (GWS) can help to estimate predictive ability (\( {r}_{y\hat{y}} \)). Predictive ability tends to be higher when training, and validation sets present a high degree of kinship. However, many tree breeding populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this study proposes methods of splitting k-fold cross-validation sets to optimize \( {r}_{y\hat{y}} \) estimates that are consistent with the breeding population and verify the impact of phenotypic and genotypic distribution on GWS. Using a simulated Eucalyptus trait (h2=0.5) and Pinus taeda L. data for diameter at breast height (h2=0.31), six methods were developed based on mutual information (I) and entropy (H) for measuring genetic similarity and phenotypic dissimilarity, respectively. All methods were evaluated for \( {r}_{y\hat{y}} \), bias, minimum squared error of prediction, and genomic heritability. The Pearson correlations of these parameters with the kinship coefficient, and I and H between and within training and validation sets were also estimated. Our results show that closer genetic similarity did not significantly increase \( {r}_{y\hat{y}} \) and that a lower H reduced \( {r}_{y\hat{y}} \) and overestimated genomic breeding values. Consequently, phenotypic diversity (high H) should be added to tree breeding populations to increase genetic gain and reduce bias. The new methods accurately fitted models according to the entropy of tree breeding populations and their genetic relationship to the training sets. Therefore, these methods provided usable estimates of genetic gain to produce consistent success of long-term tree breeding programs.



中文翻译:

全基因组选择中的熵和互信息:k-fold交叉验证集的分裂及其对树木育种的意义

全基因组选择(GWS)中的随机k倍交叉验证可以帮助估计预测能力(\({r} _ {y \ hat {y}} \))。训练时,预测能力往往更高,而验证集具有高度的亲缘关系。但是,许多树木育种种群与训练集的遗传关系较少,并且具有不同水平的表型多样性。因此,这项研究提出了分割k倍交叉验证集的方法,以优化与育种种群一致的\({r} _ {y \ hat {y}} \)估计值,并验证表型和基因型分布的影响在GWS上。使用模拟的桉树性状(h 2 = 0.5)和taeda taedaL.在乳房高度处的直径数据(h 2 = 0.31),基于互信息(I)和熵(H)开发了六种方法来分别测量遗传相似性和表型相似性。对所有方法进行了\({r} _ {y \ hat {y}} \),偏差,预测的最小平方误差和基因组遗传力的评估。还估计了这些参数与亲属系数之间的皮尔逊相关性,以及训练和验证集之间和之内的IH。我们的结果表明,更接近的遗传相似性并没有显着增加\({r} _ {y \ hat {y}} \),并且降低了较低的H\({r} _ {y \ hat {y}} \)和高估的基因组育种值。因此,应在树木育种种群中添加表型多样性(高H),以增加遗传增益并减少偏倚。新方法根据树木育种种群的熵及其与训练集的遗传关系精确拟合模型。因此,这些方法提供了有用的遗传增益估计,以产生长期成功的长期树木育种程序。

更新日期:2020-03-28
down
wechat
bug