DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.,BMC Bioinformatics

当前位置： X-MOL 学术 › BMC Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-01-09 , DOI: 10.1186/s12859-019-3190-x
Hiroyuki Fukuda ₁ , Kentaro Tomii _{1,

2}

Affiliation

BACKGROUND Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction. RESULTS We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors. CONCLUSIONS The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.

中文翻译：

DeepECA：用于从多序列比对预测蛋白质接触的端到端学习框架。

背景技术最近开发的蛋白质接触预测方法，对蛋白质结构预测至关重要的重要步骤，在很大程度上取决于目标蛋白质的深度神经网络（DNN）和多序列比对（MSA）。蛋白质序列积累的程度越来越高，因此很容易获得用于构建目标蛋白质MSA的大量序列。但是，许多情况下，可用于接触预测的MSA中包含的序列数的不同末端。丰富的序列可能会降低预测结果，但是仍然存在数量有限的序列来构建MSA的机会。为了解决这些持续存在的问题，我们努力开发一种以端到端的方式使用DNN进行联系预测的新颖框架。结果我们开发了神经网络模型来提高深层和浅层MSA的精度。结果表明，通过对深层MSA中的序列分配权重可以实现更高的预测准确性。此外，对于浅层MSA，添加一些顺序特征对于提高模型中远程接触的预测精度很有用。基于这些模型，我们通过合并对二级结构和溶剂可及表面积的预测，将模型扩展为多任务模型，以实现更高的准确性。此外，我们证明了对模型进行集成平均可以提高准确性。使用过去的CASP目标蛋白结构域，我们测试了我们的模型，并证明我们的最终模型优于或等于现有的元预测因子。结论我们建立的端到端学习框架可以使用从深层或浅层MSA得出的信息进行联系预测。最近，越来越多的蛋白质序列（包括宏基因组序列）变得可访问，这可能会降低接触预测结果。在这种情况下，我们的模型可以提供一种自动降低噪声的方法。根据基于我们的模型预测的接触和二级结构的三级结构预测结果，与目标ECA方法（从MSA开始）相比，可以获得更准确的目标蛋白三维模型。可从https://github.com/tomiilab/DeepECA获得DeepECA。包括宏基因组序列，这可能会降低联系预测结果。在这种情况下，我们的模型可以提供一种自动降低噪声的方法。根据基于我们的模型预测的接触和二级结构的三级结构预测结果，与目标ECA方法（从MSA开始）相比，可以获得更准确的目标蛋白三维模型。可从https://github.com/tomiilab/DeepECA获得DeepECA。包括宏基因组序列，这可能会降低联系预测结果。在这种情况下，我们的模型可以提供一种自动降低噪声的方法。根据基于我们的模型预测的接触和二级结构的三级结构预测结果，与目标ECA方法（从MSA开始）相比，可以获得更准确的目标蛋白三维模型。可从https://github.com/tomiilab/DeepECA获得DeepECA。从现有的ECA方法开始，可以从目标MSA获得更精确的目标蛋白质三维模型。可从https://github.com/tomiilab/DeepECA获得DeepECA。从现有的ECA方法开始，可以从目标MSA获得更精确的目标蛋白质三维模型。可从https://github.com/tomiilab/DeepECA获得DeepECA。

更新日期：2020-01-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11