当前位置: X-MOL 学术Plant Genome › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning as an Effective Method for Identifying True Single Nucleotide Polymorphisms in Polyploid Plants.
The Plant Genome ( IF 3.9 ) Pub Date : 2019-03-01 , DOI: 10.3835/plantgenome2018.05.0023
Walid Korani 1 , Josh P. Clevenger 1 , Ye Chu 1 , Peggy Ozias‐Akins 1
Affiliation  

Single nucleotide polymorphisms (SNPs) have many advantages as molecular markers since they are ubiquitous and codominant. However, the discovery of true SNPs in polyploid species is difficult. Peanut (Arachis hypogaea L.) is an allopolyploid, which has a very low rate of true SNP calling. A large set of true and false SNPs identified from the Axiom_Arachis 58k array was leveraged to train machine‐learning models to enable identification of true SNPs directly from sequence data to reduce ascertainment bias. These models achieved accuracy rates above 80% using real peanut RNA sequencing (RNA‐seq) and whole‐genome shotgun (WGS) resequencing data, which is higher than previously reported for polyploids and at least a twofold improvement for peanut. A 48K SNP array, Axiom_Arachis2, was designed using this approach resulting in 75% accuracy of calling SNPs from different tetraploid peanut genotypes. Using the method to simulate SNP variation in several polyploids, models achieved >98% accuracy in selecting true SNPs. Additionally, models built with simulated genotypes were able to select true SNPs at >80% accuracy using real peanut data. This work accomplished the objective to create an effective approach for calling highly reliable SNPs from polyploids using machine learning. A novel tool was developed for predicting true SNPs from sequence data, designated as SNP machine learning (SNP‐ML), using the described models. The SNP‐ML additionally provides functionality to train new models not included in this study for customized use, designated SNP machine learner (SNP‐MLer). The SNP‐ML is publicly available.

中文翻译:

机器学习是一种识别多倍体植物中真正的单核苷酸多态性的有效方法。

单核苷酸多态性(SNP)作为分子标记具有许多优势,因为它们无处不在且具有共性。但是,很难在多倍体物种中发现真正的SNP。花生(Arachis hypogaea L.)是一种异源多倍体,其真正SNP检出率非常低。从Axiom_ Arachis 58k阵列中识别出的大量正确和错误SNP被用来训练机器学习模型,从而能够直接从序列数据中识别真实SNP,以减少确定性偏倚。这些模型使用真实的花生RNA测序(RNA-seq)和全基因组shot弹枪(WGS)重测序数据达到了80%以上的准确率,这比以前报道的多倍体要高,并且花生至少要提高两倍。一个48K SNP阵列,Axiom_荒地使用这种方法设计了图2所示的方法,可从不同的四倍体花生基因型中获得75%的准确SNP。使用该方法模拟多个多倍体中SNP的变化,模型在选择真实SNP时的准确性达到了98%以上。此外,使用真实基因型建立的模型能够使用真实的花生数据以> 80%的准确度选择真实的SNP。这项工作完成了创建使用机器学习从多倍体中调用高度可靠的SNP的有效方法的目标。使用描述的模型,开发了一种新颖的工具,可从序列数据预测真实SNP,称为SNP机器学习(SNP-ML)。SNP‐ML还提供了针对本研究未包括的新模型进行培训的功能,以供定制使用,即指定的SNP机器学习器(SNP‐MLer)。SNP-ML是公开可用的。
更新日期:2019-03-01
down
wechat
bug