当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2018-07-23 , DOI: 10.1109/tcbb.2018.2858806
Jiyun Zhou , Qin Lu , Ruifeng Xu , Lin Gui , Hongpeng Wang

Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.

中文翻译:

EL_LSTM:通过结合长期短期记忆和整合学习,从蛋白质序列中预测DNA结合残基。

过去用于DNA结合残基预测的大多数工作都没有考虑残基之间的关系。在本文中,我们提出了一种新的DNA结合残基预测方法,称为EL_LSTM,它包括两个主要成分。第一个组件是长短期记忆(LSTM),它通过二元语法模型学习残基之间的成对关系,然后学习所有残基的特征向量。第二个组件是基于整体学习的分类器,旨在解决结合残基预测中的数据不平衡问题。我们在整体学习中使用套袋策略的一种变体来获得均衡的样本。对PDNA-224和DBP-123的评估表明,添加特征关系比没有特征关系的分类器的效果更好,在MCC上至少为0.028,在ST上至少为1.18%,为0。在AUC上为012。这表明了特征关系对于DNA结合残基预测的有用性。使用集成学习的评估表明,与使用单个LSTM分类器相比,MCC的改进至少可以达到0.021,ST的改进可以达到1.32%,AUC的改进可以达到0.018。与最新的预测变量进行比较表明,我们提出的EL_LSTM明显优于它们。进一步的特征分析验证了LSTM预测DNA结合残基的有效性。与最新的预测变量进行比较表明,我们提出的EL_LSTM明显优于它们。进一步的特征分析验证了LSTM预测DNA结合残基的有效性。与最新的预测变量进行比较表明,我们提出的EL_LSTM明显优于它们。进一步的特征分析验证了LSTM预测DNA结合残基的有效性。
更新日期:2020-03-07
down
wechat
bug