当前位置: X-MOL 学术Genet. Sel. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.
Genetics Selection Evolution ( IF 4.1 ) Pub Date : 2020-02-24 , DOI: 10.1186/s12711-020-00531-z
Rostam Abdollahi-Arpanahi 1 , Daniel Gianola 2 , Francisco Peñagaricano 1, 3
Affiliation  

BACKGROUND Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.

中文翻译:

复杂表型的基因组预测的深度学习与参数化和集成方法。

背景技术将大量的基因组数据转化为用于预测复杂性状的有价值的知识已经成为动植物育种者的重要挑战。复杂特征的预测并没有摆脱当前对机器学习的兴趣,包括对诸如多层感知器(MLP)和卷积神经网络(CNN)等深度学习算法的兴趣。这项研究的目的是比较两种深度学习方法(MLP和CNN),两种整体学习方法(随机森林(RF)和梯度提升(GB))以及两种参数方法(基因组最佳线性无偏预测)的预测性能(GBLUP)和贝叶斯B]使用真实和模拟的数据集。方法真实的数据集由11790名荷斯坦公牛组成,它们具有父受胎率(SCR)记录,并具有5​​8k个单核苷酸多态性(SNP)的基因型。为了支持对深度学习方法的评估,使用观察到的基因型数据作为模板进行了各种模拟研究,假设遗传力为0.30(有加性或非加性基因效应)和两个不同数量的数量性状核苷酸(100和1000) 。结果在公牛数据集中,与GB(0.36),贝叶斯B(0.34),GBLUP(0.33),RF(0.32),CNN(0.29)和MLP(0.26)的相关性最佳。当使用预测的均方误差时,观察到相同的趋势。模拟表明,当基因作用仅是加和时,参数方法优于其他方法。当基因作用是加性,优势和两基因座上位的组合时,通过梯度增强获得最佳预测能力,深度学习相对于参数方法的优越性取决于控制特征的基因座数量和样本量。实际上,对于包含8万个人的大型数据集,对于非加性基因作用的性状,深度学习方法的预测性能与参数方法相似或稍好。结论对于非加性基因作用的性状预测,梯度增强是一种可靠的方法。除非非加性方差相当大,否则深度学习方法对基因组预测的效果更好。对于非加性基因作用的性状,深度学习方法的预测性能与参数方法相似或稍好。结论对于非加性基因作用的性状预测,梯度增强是一种可靠的方法。除非非加性方差相当大,否则深度学习方法对基因组预测的效果更好。对于非加性基因作用的性状,深度学习方法的预测性能与参数方法相似或稍好。结论对于非加性基因作用的性状预测,梯度增强是一种可靠的方法。除非非加性方差相当大,否则深度学习方法对基因组预测的效果更好。
更新日期:2020-04-22
down
wechat
bug