当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GSCNN: a composition of CNN and Gibb Sampling computational strategy for predicting promoter in bacterial genomes
International Journal of Information Technology Pub Date : 2021-01-04 , DOI: 10.1007/s41870-020-00565-y
S. Sasikala , T. Ratha Jeyalakshmi

This paper is about a new computational strategy to find the promoter sequence which is a transcription starting factor in DNA without which the gene sequence cannot be activated for further protein synthesis process. Genes are the hereditary source of organisms’ functionality of living things. Sequence of activities is carried out to reveal the protein production process hidden in the genes. In this regard promoter prediction is essential to bring out correctness of protein synthesis in decoding the genetical information. Also it enables the biologists to find the genetical malfunctions. Nowadays numerous computational methods which have been proposed are helpful to the biologist in related tasks. Earlier works applied Gibb sampling method, Markov Chain Model and machine learning strategies to predict the promoters. In this paper a new computational strategy named as GSCNN is introduced which combines Gibb sampling with Convolution Neural Network (CNN). This model uses Gibb sampling method to select the best profile features of DNA and extract the scores as feature vector with the help of position frequency matrix. Those extracted feature vectors are further fed into Convolution Neural Network to predict the sigma (σ) 54 promoter of bacterial genome. The performance of the proposed system is evaluated with the help of Bacillus subtilis NC_000964 dataset. The sigma (σ) 54 promoter prediction using the proposed GSCNN method achieved significant improvement in performance metrics compared with the traditional machine learning algorithms such as Decision Tree (DT), K-Nearest Neighbor(KNN), Random Forest(RF) and Support Vector Machine(SVM).



中文翻译:

GSCNN:CNN和Gibb采样的计算策略,用于预测细菌基因组中的启动子

本文探讨一种新的计算策略,以寻找启动子序列,该启动子序列是DNA中的转录起始因子,没有该启动子序列,则该基因序列无法被激活以用于进一步的蛋白质合成过程。基因是生物功能性的遗传来源。进行活动序列以揭示隐藏在基因中的蛋白质生产过程。在这方面,启动子预测对于在解码遗传信息中展现蛋白质合成的正确性至关重要。它还使生物学家能够发现遗传缺陷。如今,已提出的许多计算方法对生物学家完成相关任务很有帮助。早期的工作使用吉布(Gibb)采样方法,马尔可夫链模型(Markov Chain Model)和机器学习策略来预测启动子。本文介绍了一种名为GSCNN的新计算策略,该策略将Gibb采样与卷积神经网络(CNN)相结合。该模型使用Gibb采样方法选择DNA的最佳轮廓特征,并借助位置频率矩阵将分数提取为特征向量。将那些提取的特征向量进一步输入到卷积神经网络中,以预测细菌基因组的sigma(σ)54启动子。借助于枯草芽孢杆菌NC_000964数据集评估了所提出系统的性能。与传统的机器学习算法(例如决策树(DT),K最近邻(KNN),随机森林(RF)和支持向量)相比,使用建议的GSCNN方法进行的σ(σ)54启动子预测在性能指标上有了显着改善机器(SVM)。

更新日期:2021-01-04
down
wechat
bug