当前位置: X-MOL 学术Mol. Genet. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting gene phenotype by multi-label multi-class model based on essential functional features
Molecular Genetics and Genomics ( IF 3.1 ) Pub Date : 2021-04-29 , DOI: 10.1007/s00438-021-01789-8
Lei Chen 1, 2 , Zhandong Li 3 , Tao Zeng 4 , Yu-Hang Zhang 5 , Hao Li 3 , Tao Huang 6 , Yu-Dong Cai 1
Affiliation  

Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene–gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.



中文翻译:

基于基本功能特征的多标签多类模型预测基因表型

表型是遗传学中最重要的概念之一,用于描述可以观察到的研究对象的所有特征。考虑到表型反映了基因型和环境因素的综合特征,因此很难定义表型特征,甚至很难预测未知的表型。受当前生物学技术的限制,获得与大规模表型相关的基因/蛋白质的足够的结构信息仍然是相当昂贵和费时的。已经提出了各种生物信息学方法来解决该问题,并且研究人员已经证实了基于功能网络的预测的功效和预测准确性。但是一般的功能描述具有用于表型预测的高度复杂的内部结构。为了进一步解决该问题并提高对十多种表型的表型预测的功效,我们首先从GO和KEGG中提取功能富集特征,然后使用node2vec从基因-基因网络中学习基因的功能嵌入特征。通过某些特征选择方法(Boruta,最小冗余最大相关性)分析所有这些特征,以生成特征列表。此类列表被馈入增量特征选择中,并结合了一些由RAkEL构建的多标签分类器和一些经典的基础分类器,以构建用于表型预测的最佳多标签多类分类模型。根据最近的研究,我们的方法确实确定了许多文献支持的基因/蛋白质及其相关的表型,

更新日期:2021-04-29
down
wechat
bug