当前位置: X-MOL 学术Biocybern. Biomed. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions
Biocybernetics and Biomedical Engineering ( IF 5.3 ) Pub Date : 2020-04-03 , DOI: 10.1016/j.bbe.2020.03.007
M. Raman Kumar , Naveen Kumar Vaegae

The fundamental step in genomic signal processing applications is to assign mathematical descriptor to nucleotides {A, T, G, C} of DNA molecule for discrete representation. The discrete representation should replicate biological information of gene when analyzed with digital signal processing tools. In this aspect, a novel binary representation of DNA sequence by combining structural and chemical information of original DNA sequence has been proposed for the identification of protein coding regions of eukaryotes. The identification model comprises two stages, mainly, numerical encoding in first stage, and analysis of biological behavior through digital signal processing algorithms in second stage. In the first stage, a new numerical encoding method based on Walsh codes of order-4 is proposed to obtain 1-D binary discrete sequence. In the second stage, the modified Gabor wavelet transform (MGWT) is employed on the discretized DNA sequence for spectrum analysis. The optimal gene numerical encoding and multiresolution approach of MGWT has readily identified the structures of coding regions of unknown gene sequences. The proposed model is validated by analyzing prediction efficiency in terms of statistical metrics such as sensitivity, specificity, accuracy on both sequence and data base level. Furthermore, the results are compared by plotting receiver operating curves (ROC) for all classification thresholds for the state-of-art encoding methods. Area under curve (AUC) value of 0.86 at sequence level and 0.84 at database level is achieved. Performance metrics indicate that the proposed encoding method exhibits relatively better performance than other numerical encoding methods.



中文翻译:

改进的Gabor小波变换用于DNA表示的新数值方法用于鉴定蛋白质编码区

基因组信号处理应用的基本步骤是将数学描述符分配给DNA分子的核苷酸{A,T,G,C}以进行离散表示。用数字信号处理工具进行分析时,离散表示应复制基因的生物学信息。在这方面,已经提出了通过结合原始DNA序列的结构和化学信息的DNA序列的新型二进制表示形式,用于鉴定真核生物的蛋白质编码区。识别模型包括两个阶段,主要是第一阶段的数字编码和第二阶段通过数字信号处理算法的生物学行为分析。在第一阶段,提出了一种基于4阶沃尔什码的数字编码方法,以获取一维二进制离散序列。在第二阶段,将改进的Gabor小波变换(MGWT)用于离散化的DNA序列以进行光谱分析。MGWT的最佳基因数字编码和多分辨率方法已经很容易确定未知基因序列的编码区的结构。通过根据统计指标(如敏感性,特异性,序列和数据库级别的准确性)分析预测效率来验证所提出的模型。此外,通过为最新编码方法绘制所有分类阈值的接收器工作曲线(ROC)来比较结果。曲线下面积(AUC)值在序列级别为0.86,在数据库级别为0.84。性能指标表明,提出的编码方法比其他数字编码方法表现出相对更好的性能。

更新日期:2020-04-03
down
wechat
bug