当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting Protein Phosphorylation Sites Based on Deep Learning
Current Bioinformatics ( IF 2.4 ) Pub Date : 2020-05-01 , DOI: 10.2174/1574893614666190902154332
Haixia Long 1 , Zhao Sun 2 , Manzhi Li 3 , Hai Yan Fu 1 , Ming Cai Lin 4
Affiliation  

Background: Protein phosphorylation is one of the most important Post-translational Modifications (PTMs) occurring at amino acid residues serine (S), threonine (T), and tyrosine (Y). It plays critical roles in protein structure and function predicting. With the development of novel high-throughput sequencing technologies, there are a huge amount of protein sequences being generated and stored in databases.

Objective: It is of great importance in both basic research and drug development to quickly and accurately predict which residues of S, T, or Y can be phosphorylated.

Methods: In order to solve the problem, a novel hybrid deep learning model with a convolutional neural network and bi-directional long short-term memory recurrent neural network (CNN+BLSTM) is proposed for predicting phosphorylation sites in proteins. The model contains a list of layers that transform the input data into an output class, in which the convolution layer captures higher-level abstraction features of amino acid, while the recurrent layer captures long-term dependencies between amino acids to improve predictions. The joint model learns interactions between higher-level features derived from the protein sequence to predict the phosphorylated sites.

Results: We applied our model together with two canonical methods namely iPhos-PseEn and MusiteDeep. A 5-fold cross-validation process indicated that CNN+BLSTM outperforms the two competitors in various evaluation metrics like the area under the receiver operating characteristic and precision-recall curves, the Matthews correlation coefficient, F-measure, accuracy, and so on.

Conclusion: CNN+BLSTM is promising in identifying potential protein phosphorylation for further experimental validation.



中文翻译:

基于深度学习的蛋白质磷酸化位点预测

背景:蛋白质磷酸化是发生在丝氨酸(S),苏氨酸(T)和酪氨酸(Y)氨基酸残基上的最重要的翻译后修饰(PTM)之一。它在蛋白质结构和功能预测中起关键作用。随着新型高通量测序技术的发展,正在生成大量蛋白质序列并将其存储在数据库中。

目的:快速准确地预测哪些S,T或Y残基可以被磷酸化,对于基础研究和药物开发都至关重要。

方法:为解决该问题,提出了一种具有卷积神经网络和双向长短期记忆递归神经网络(CNN + BLSTM)的新型混合深度学习模型,用于预测蛋白质的磷酸化位点。该模型包含将输入数据转换为输出类的层列表,其中卷积层捕获氨基酸的高级抽象特征,而循环层捕获氨基酸之间的长期依存关系以改善预测。联合模型学习源自蛋白质序列的高级特征之间的相互作用,以预测磷酸化位点。

结果:我们将模型与两种规范方法(iPhos-PseEn和MusiteDeep)一起应用。5次交叉验证过程表明,CNN + BLSTM在各种评估指标(例如接收器工作特性和精确召回曲线下的面积,马修斯相关系数,F量度,准确性等)的各种评估指标上均胜过两个竞争对手。

结论:CNN + BLSTM在鉴定潜在的蛋白质磷酸化方面有希望,可用于进一步的实验验证。

更新日期:2020-05-01
down
wechat
bug