An intelligent computational model for prediction of promoters and their strength via natural language processing,Chemometrics and Intelligent Laboratory Systems

当前位置： X-MOL 学术 › Chemometr. Intell. Lab. Systems › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An intelligent computational model for prediction of promoters and their strength via natural language processing
Chemometrics and Intelligent Laboratory Systems ( IF 3.7 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.chemolab.2020.104034
Muhammad Tahir , Maqsood Hayat , Sarah Gul , Kil To Chong

Abstract In DNA, a promoter is an essential part of genes that controls the transcription of specific genes in a particular tissue or cells. The combination of RNA polymerase and a number of various proteins named "σ-factors” can define the transcription start site (TSS) by inducing RNA holoenzyme. Further, Promoter is categorized into strong and weak promoters on the basis of promoter strength. Owing to exponential increase of RNA/DNA and protein samples in the post-genomic era, developing a simple and efficient sequential-based intelligent computational model for the discrimination of promoters is a challenging job. An intelligent computational model namely: 2L-iPSW(word2vec) was introduced for discrimination of promoters and their strength, in this regard. Machine learning and Deep learning algorithms in conjunction with natural language processing method i.e., “word2vec” are used. The proposed computational model 2L-iPSW(word2vec) achieved 91.42% of accuracy for 1st layer contains promoters and non-promoters which is 8.29% higher than the existing model, whereas 82.42% of accuracy for 2nd layer identifies strong promoter and weak promoter which is 11.22% advanced than the present model. Proposed 2L-iPSW(word2vec) model obtained efficient success rates than the present models in terms of all assessment metrics. It is thus greatly observed that the 2L-iPSW(word2vec) model will lead a useful tool for academic research on promoter identification.

中文翻译：

通过自然语言处理预测启动子及其强度的智能计算模型

摘要在DNA中，启动子是基因的重要组成部分，它控制特定组织或细胞中特定基因的转录。RNA聚合酶与多种称为“σ-因子”的蛋白质结合，可以通过诱导RNA全酶来定义转录起始位点（TSS）。此外，启动子根据启动子强度分为强启动子和弱启动子。由于在后基因组时代，RNA/DNA 和蛋白质样本呈指数增长，开发一种简单高效的基于序列的智能计算模型来区分启动子是一项具有挑战性的工作。一个智能计算模型即：2L-iPSW(word2vec)在这方面引入了对发起人及其实力的歧视。机器学习和深度学习算法结合自然语言处理方法，即“word2vec”被使用。所提出的计算模型 2L-iPSW(word2vec) 在第一层包含启动子和非启动子的准确率达到 91.42%，比现有模型高 8.29%，而第二层的准确率为 82.42%，识别强启动子和弱启动子11.22% 比当前模型先进。提出的 2L-iPSW(word2vec) 模型在所有评估指标方面都比现有模型获得了有效的成功率。因此，2L-iPSW(word2vec) 模型将成为启动子识别学术研究的有用工具。第一层42%的准确率包含启动子和非启动子，比现有模型高8.29%，而第二层82.42%的准确率识别强启动子和弱启动子，比现有模型高11.22%。提出的 2L-iPSW(word2vec) 模型在所有评估指标方面都比现有模型获得了有效的成功率。因此，2L-iPSW(word2vec) 模型将成为启动子识别学术研究的有用工具。第一层42%的准确率包含启动子和非启动子，比现有模型高8.29%，而第二层82.42%的准确率识别强启动子和弱启动子，比现有模型高11.22%。提出的 2L-iPSW(word2vec) 模型在所有评估指标方面都比现有模型获得了有效的成功率。因此，2L-iPSW(word2vec) 模型将成为启动子识别学术研究的有用工具。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11