Transformer protein language models are unsupervised structure learners,bioRxiv - Synthetic Biology

当前位置： X-MOL 学术 › bioRxiv. Synth. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Transformer protein language models are unsupervised structure learners
bioRxiv - Synthetic Biology Pub Date : 2020-12-15 , DOI: 10.1101/2020.12.15.422761
Roshan M Rao , Joshua Meier , Tom Sercu , Sergey Ovchinnikov , Alexander Rives

Unsupervised contact prediction is central to uncovering physical, structural, and functional constraints for protein structure determination and design. For decades, the predominant approach has been to infer evolutionary constraints from a set of related sequences. In the past year, protein language models have emerged as a potential alternative, but performance has fallen short of state-of-the-art approaches in bioinformatics. In this paper we demonstrate that Transformer attention maps learn contacts from the unsupervised language modeling objective. We find the highest capacity models that have been trained to date already outperform a state-of-the-art unsupervised contact prediction pipeline, suggesting these pipelines can be replaced with a single forward pass of an end-to-end model.

中文翻译：

变压器蛋白语言模型是无监督的结构学习者

无监督的接触预测对于揭示蛋白质结构确定和设计的物理，结构和功能限制至关重要。几十年来，主要方法是从一组相关序列中推断进化限制。在过去的一年中，蛋白质语言模型已经成为一种潜在的替代方法，但是其性能却远远低于生物信息学领域的最新方法。在本文中，我们证明了Transformer注意图是从无监督语言建模目标中学习联系人的。我们发现，迄今为止，经过训练的最高容量模型已经优于最新的无监督联系预测管道，这表明这些管道可以用端到端模型的单次正向传递来代替。

更新日期：2020-12-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>