当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ncPro-ML: an integrated computational tool for identifying non-coding RNA promoters in multiple species
Computational and Structural Biotechnology Journal ( IF 4.4 ) Pub Date : 2020-09-10 , DOI: 10.1016/j.csbj.2020.09.001
Qiang Tang , Fulei Nie , Juanjuan Kang , Wei Chen

The promoter is located near the transcription start sites and regulates transcription initiation of the gene. Accurate identification of promoters is essential for understanding the mechanism of gene regulation. Since experimental methods are costly and ineffective, developing efficient and accurate computational tools to identify promoters are necessary. Although a series of methods have been proposed for identifying promoters, none of them is able to identify the promoters of non-coding RNA (ncRNA). In the present work, a new method called ncPro-ML was proposed to identify the promoter of ncRNA in Homo sapiens and Mus musculus, in which different kinds of sequence encoding schemes were used to convert DNA sequences into feature vectors. To test the length effect, for each species, datasets including sequences with different lengths were built. The results demonstrated that ncPro-ML achieved the best performance based on the dataset with the sequence length of 221 nucleotides for human and mouse. The performances of ncPro-ML were also satisfying from both independent dataset test and cross-species test. The results indicate that the proposed predictor can server as a powerful tool for the discovery of ncRNA promoters. In addition, a web-server for ncPro-ML was developed, which can be freely accessed at http://www.bio-bigdata.cn/ncPro-ML/.



中文翻译:

ncPro-ML:用于识别多种物种中非编码RNA启动子的集成计算工具

启动子位于转录起始位点附近,并调节基因的转录起始。正确鉴定启动子对于理解基因调控机制至关重要。由于实验方法昂贵且无效,因此有必要开发有效且准确的计算工具来识别启动子。尽管已提出了一系列用于鉴定启动子的方法,但它们均不能鉴定非编码RNA(ncRNA)的启动子。在目前的工作中,一种新的方法叫做ncPro-ML提出了识别非编码RNA的启动子智人小家鼠,其中使用了不同种类的序列编码方案将DNA序列转换为特征向量。为了测试长度效应,对于每个物种,建立了包括具有不同长度的序列的数据集。结果表明,ncPro-ML在具有221个核苷酸的人和小鼠序列长度的数据集的基础上获得了最佳性能。独立数据集测试和跨物种测试的ncPro-ML的性能也令人满意。结果表明,提出的预测因子可以作为发现ncRNA启动子的有力工具。此外,还开发了用于ncPro-ML的Web服务器,可以从http://www.bio-bigdata.cn/ncPro-ML/免费访问。

更新日期:2020-09-10
down
wechat
bug