当前位置: X-MOL 学术J. Microbiol. Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting essential genes of 37 prokaryotes by combining information-theoretic features
Journal of Microbiological Methods ( IF 1.7 ) Pub Date : 2021-07-31 , DOI: 10.1016/j.mimet.2021.106297
Xiao Liu 1 , Yachuan Luo 1 , Ting He 1 , Meixiang Ren 1 , Yuqiao Xu 1
Affiliation  

Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales.

In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.



中文翻译:

结合信息论特征预测37种原核生物的必需基因

生物体的繁殖和生存需要必需基因。快速鉴定必需基因在生物医学中具有实际应用价值。信息论是研究信息传递的学科。基于遗传与信息传递的相似性,信息论中的测度可以应用于不同尺度的基因序列分析。

在本研究中,我们采用信息论方法提取的 114 个特征来构建基本基因预测模型。我们应用反向传播神经网络来构建分类器,并用它来预测 37 种原核生物的基本基因。通过应用生物体内预测和留一物种预测来评估分类器的性能。在 37 个原核生物中,生物体内预测和留一物种预测的平均 AUC 分数分别为 0.791 和 0.717。考虑到特征集中潜在的冗余,我们进行了特征选择并构建了一个关键特征子集。在上述两种预测方法中,利用关键特征获得的37种生物的平均AUC分数分别为0.786和0.714。

更新日期:2021-08-10
down
wechat
bug