当前位置: X-MOL 学术J. Comput. Chem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning
Journal of Computational Chemistry ( IF 3 ) Pub Date : 2018-10-05 , DOI: 10.1002/jcc.25534
Rhys Heffernan 1 , Kuldip Paliwal 1 , James Lyons 1 , Jaswinder Singh 1 , Yuedong Yang 2 , Yaoqi Zhou 3
Affiliation  

Predicting protein structure from sequence alone is challenging. Thus, the majority of methods for protein structure prediction rely on evolutionary information from multiple sequence alignments. In previous work we showed that Long Short‐Term Bidirectional Recurrent Neural Networks (LSTM‐BRNNs) improved over regular neural networks by better capturing intra‐sequence dependencies. Here we show a single‐sequence‐based prediction method employing LSTM‐BRNNs (SPIDER3‐Single), that consistently achieves Q3 accuracy of 72.5%, and correlation coefficient of 0.67 between predicted and actual solvent accessible surface area. Moreover, it yields reasonably accurate prediction of eight‐state secondary structure, main‐chain angles (backbone ϕ and ψ torsion angles and C α‐atom‐based θ and τ angles), half‐sphere exposure, and contact number. The method is more accurate than the corresponding evolutionary‐based method for proteins with few sequence homologs, and computationally efficient for large‐scale screening of protein‐structural properties. It is available as an option in the SPIDER3 server, and a standalone version for download, at http://sparks-lab.org. © 2018 Wiley Periodicals, Inc.

中文翻译:

通过深度全序列学习基于单序列预测蛋白质二级结构和溶剂可及性

仅从序列预测蛋白质结构具有挑战性。因此,大多数蛋白质结构预测方法依赖于来自多个序列比对的进化信息。在之前的工作中,我们展示了长短期双向循环神经网络 (LSTM-BRNNs) 通过更好地捕获序列内依赖性而优于常规神经网络。在这里,我们展示了一种采用 LSTM-BRNN(SPIDER3-Single)的基于单序列的预测方法,该方法始终达到 72.5% 的 Q3 准确率,预测和实际溶剂可及表面积之间的相关系数为 0.67。此外,它对八态二级结构、主链角度(主链 ϕ 和 ψ 扭转角以及基于 C α 原子的 θ 和 τ 角)、半球暴露和接触数产生了相当准确的预测。对于序列同源物很少的蛋白质,该方法比相应的基于进化的方法更准确,并且在大规模筛选蛋白质结构特性时计算效率更高。它可作为 SPIDER3 服务器中的一个选项提供,也可在 http://sparks-lab.org 下载独立版本。© 2018 Wiley Periodicals, Inc.
更新日期:2018-10-05
down
wechat
bug