当前位置: X-MOL 学术Plant Genome › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays)
The Plant Genome ( IF 4.219 ) Pub Date : 2020-04-29 , DOI: 10.1002/tpg2.20015
Xiuru Dai 1, 2 , Zheng Xu 3 , Zhikai Liang 2 , Xiaoyu Tu 4 , Silin Zhong 4 , James C. Schnable 2 , Pinghua Li 1
Affiliation  

Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non‐homology gene features. Among the eight supervised classification algorithms evaluated, random‐forest‐based prediction consistently provided the most accurate gene function prediction. Non‐homology‐based functional annotation provides complementary strengths to homology‐based annotation, with higher average performance in Biological Process GO terms, the domain where homology‐based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology‐based functional annotation is highest. GO prediction models trained with homology‐based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non‐homology‐based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology‐based functional annotations.

中文翻译:

玉米基因功能的基于非同源性的预测(Zea mays ssp。mays)

基因组测序和注释的进步减轻了鉴定新基因序列的难度。预测这些新发现的基因的功能仍然具有挑战性。来自共同祖先序列的基因可能具有共同的功能。结果,同源性被广泛用于基因功能预测。这意味着功能注释错误也会从一个物种传播到另一个物种。对基于机器学习分类算法的几种方法进行了评估,这些方法可以根据非同源基因特征准确预测基因功能。在评估的八种监督分类算法中,基于随机森林的预测始终提供最准确的基因功能预测。基于非同源性的功能注释为基于同源性的注释提供了互补的优势,在Biological Process GO术语中平均性能较高,基于同源性的功能注释的性能最差的域,而在Molecular Function GO术语中基于同源性的功能注释的准确性最高的域的性能较弱。使用基于同源性的注释训练的GO预测模型能够从手动策划的“黄金标准” GO注释集中成功预测注释。基于机器学习的基于非同源性的功能注释最终可以证明是一种有用的方法,既可以为缺乏功能特征的同源物的孤儿基因分配预测功能,又可以识别和纠正通过基于同源性功能注释传播的功能注释错误。在分子功能GO术语中,基于同源性的功能注释的性能最差而性能较弱的域,即基于同源性的功能注释的准确性最高的域。使用基于同源性的注释训练的GO预测模型能够从手动策划的“黄金标准” GO注释集中成功预测注释。基于机器学习的基于非同源性的功能注释最终可以证明是一种有用的方法,既可以为缺乏功能特征的同源物的孤儿基因分配预测功能,又可以识别和纠正通过基于同源性功能注释传播的功能注释错误。在分子功能GO术语中,基于同源性的功能注释的性能最差而性能较弱的域,即基于同源性的功能注释的准确性最高的域。使用基于同源性的注释训练的GO预测模型能够从手动策划的“黄金标准” GO注释集中成功预测注释。基于机器学习的基于非同源性的功能注释最终可以证明是一种有用的方法,既可以为缺乏功能特征的同源物的孤儿基因分配预测功能,又可以识别和纠正通过基于同源性功能注释传播的功能注释错误。基于同源性的功能注释的准确性最高的域。使用基于同源性的注释训练的GO预测模型能够从手动策划的“黄金标准” GO注释集中成功预测注释。基于机器学习的基于非同源性的功能注释最终可以证明是一种有用的方法,既可以为缺乏功能特征的同源物的孤儿基因分配预测功能,又可以识别和纠正通过基于同源性功能注释传播的功能注释错误。基于同源性的功能注释的准确性最高的域。使用基于同源性的注释训练的GO预测模型能够从手动策划的“黄金标准” GO注释集中成功预测注释。基于机器学习的基于非同源性的功能注释最终可以证明是一种有用的方法,既可以为缺乏功能特征的同源物的孤儿基因分配预测功能,又可以识别和纠正通过基于同源性功能注释传播的功能注释错误。
更新日期:2020-04-29
down
wechat
bug