当前位置: X-MOL 学术PeerJ › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
PeerJ ( IF 2.3 ) Pub Date : 2021-05-03 , DOI: 10.7717/peerj.11262
Guobin Li 1 , Xiuquan Du 2 , Xinlu Li 1 , Le Zou 1 , Guanhong Zhang 1 , Zhize Wu 1
Affiliation  

DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/.

中文翻译:


基于深度学习,利用局部特征和与一级序列的长期依赖性来预测 DNA 结合蛋白



DNA 结合蛋白 (DBP) 在许多生物学功能中发挥着关键作用,例如选择性剪接、RNA 编辑和甲基化。已经提出了许多传统的机器学习(ML)方法和深度学习(DL)方法来预测 DBP。然而,这些方法要么依赖于手动特征提取,要么无法捕获 DNA 序列中的长期依赖性。在本文中,我们提出了一种称为 PDBP-Fusion 的方法,基于局部特征和仅来自主序列的长期依赖性的融合来识别 DBP。我们利用卷积神经网络 (CNN) 学习局部特征,并使用双向长短期记忆网络 (Bi-LSTM) 捕获上下文中的关键长期依赖性。此外,我们同时进行特征提取、模型训练和模型预测。 PDBP-Fusion 方法在 PDB14189 基准数据集上预测 DBP 的灵敏度为 86.45%,特异性为 79.13%,准确度为 82.81%,MCC 为 0.661。与其他先进预测模型相比,我们提出的方法的 MCC 提高了至少 9.1%。此外,PDBP-Fusion还在PDB2272独立数据集上获得了卓越的性能和模型鲁棒性。证明PDBP-Fusion可用于准确有效地从序列中预测DBP;在线服务器位于http://119.45.144.26:8080/PDBP-Fusion/。
更新日期:2021-05-03
down
wechat
bug