Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction,Structure

当前位置： X-MOL 学术 › Structure › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction
Structure ( IF 4.4 ) Pub Date : 2022-05-23 , DOI: 10.1016/j.str.2022.05.001
Konstantin Weissenow ₁ , Michael Heinzinger ₁ , Burkhard Rost ₂

Affiliation

Advanced protein structure prediction requires evolutionary information from multiple sequence alignments (MSAs) from evolutionary couplings that are not always available. Artificial intelligence (AI)-based predictions inputting only single sequences are faster but so inaccurate as to render speed irrelevant. Here, we described a competitive prediction of inter-residue distances (2D structure) exclusively inputting embeddings from pre-trained protein language models (pLMs), namely ProtT5, from single sequences into a convolutional neural network (CNN) with relatively few layers. The major advance used the ProtT5 attention heads. Our new method, EMBER2, which never requires any MSAs, performed similarly to other methods that fully rely on co-evolution. Although clearly not reaching AlphaFold2, our leaner solution came somehow close at substantially lower costs. By generating protein-specific rather than family-averaged predictions, EMBER2 might better capture some features of particular protein structures. Results from using protein engineering and deep mutational scanning (DMS) experiments provided at least a proof of principle for such a speculation.

中文翻译：

用于快速、准确和无对齐的蛋白质结构预测的蛋白质语言模型嵌入

高级蛋白质结构预测需要来自进化耦合的多序列比对 (MSA) 的进化信息，这些信息并不总是可用的。仅输入单个序列的基于人工智能 (AI) 的预测速度更快，但非常不准确，以至于与速度无关。在这里，我们描述了一种竞争性预测残差间距离（2D 结构），专门将来自预训练的蛋白质语言模型（pLM）（即 ProtT5）的嵌入从单个序列输入到层数相对较少的卷积神经网络（CNN）中。主要进步使用了 ProtT5 注意力头。我们的新方法EMBER2不需要任何 MSA，其执行方式与完全依赖协同进化的其他方法类似。虽然显然没有达到AlphaFold2，我们更精简的解决方案在某种程度上以大大降低的成本接近。通过生成蛋白质特异性而非家族平均预测，EMBER2可能更好地捕捉特定蛋白质结构的某些特征。使用蛋白质工程和深度突变扫描 (DMS) 实验的结果至少为这种推测提供了原理证明。

更新日期：2022-05-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11