当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-03-29 , DOI: 10.1109/tcbb.2021.3069263
Ning Wang 1 , Jun Zhang 2 , Bin Liu 3
Affiliation  

DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two important nucleic acid-binding proteins (NABPs), which play important roles in biological processes such as replication, translation and transcription of genetic material. Some proteins (DRBPs) bind to both DNA and RNA, also play a key role in gene expression. Identification of DBPs, RBPs and DRBPs is important to study protein-nucleic acid interactions. Computational methods are increasingly being proposed to automatically identify DNA- or RNA-binding proteins based only on protein sequences. One challenge is to design an effective protein representation method to convert protein sequences into fixed-dimension feature vectors. In this study, we proposed a novel protein representation method called Position-Specific Scoring Matrix (PSSM) and Position-Specific Frequency Matrix (PSFM) Cross Transformation (PPCT) to represent protein sequences. This method contains the evolutionary information in PSSM and PSFM, and their correlations. A new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DBPs, RBPs and DRBPs. The experimental results on the independent dataset and the tomato genome proved the effectiveness of the proposed method. A user-friendly web-server of IDRBP-PPCT was constructed, which is freely available at http://bliulab.net/IDRBP-PPCT.

中文翻译:

IDRBP-PPCT:基于位置特异性评分矩阵和位置特异性频率矩阵交叉变换识别核酸结合蛋白

DNA结合蛋白(DBPs)和RNA结合蛋白(RBPs)是两种重要的核酸结合蛋白(NABPs),在遗传物质的复制、翻译和转录等生物学过程中发挥着重要作用。一些蛋白质 (DRBP) 同时与 DNA 和 RNA 结合,在基因表达中也起着关键作用。DBPs、RBPs 和 DRBPs 的鉴定对于研究蛋白质-核酸相互作用很重要。越来越多的计算方法被提出来自动识别仅基于蛋白质序列的 DNA 或 RNA 结合蛋白。一个挑战是设计一种有效的蛋白质表示方法,将蛋白质序列转换为固定维度的特征向量。在这项研究中,我们提出了一种新的蛋白质表示方法,称为位置特异性评分矩阵(PSSM)和位置特异性频率矩阵(PSFM)交叉变换(PPCT)来表示蛋白质序列。该方法包含 PSSM 和 PSFM 中的进化信息及其相关性。通过将 PPCT 与基于随机森林算法的两层框架相结合来识别 DBPs、RBPs 和 DRBPs,提出了一种新的计算预测器 IDRBP-PPCT。在独立数据集和番茄基因组上的实验结果证明了所提方法的有效性。构建了一个用户友好的 IDRBP-PPCT 网络服务器,可在以下网址免费获得 通过将 PPCT 与基于随机森林算法的两层框架相结合来识别 DBPs、RBPs 和 DRBPs,提出了一种新的计算预测器 IDRBP-PPCT。在独立数据集和番茄基因组上的实验结果证明了所提方法的有效性。构建了一个用户友好的 IDRBP-PPCT 网络服务器,可在以下网址免费获得 通过将 PPCT 与基于随机森林算法的两层框架相结合来识别 DBPs、RBPs 和 DRBPs,提出了一种新的计算预测器 IDRBP-PPCT。在独立数据集和番茄基因组上的实验结果证明了所提方法的有效性。构建了一个用户友好的 IDRBP-PPCT 网络服务器,可在以下网址免费获得http://bliulab.net/IDRBP-PPCT.
更新日期:2021-03-29
down
wechat
bug