当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Highly accurate protein structure prediction for the human proteome
Nature ( IF 64.8 ) Pub Date : 2021-07-22 , DOI: 10.1038/s41586-021-03828-1
Kathryn Tunyasuvunakool 1 , Jonas Adler 1 , Zachary Wu 1 , Tim Green 1 , Michal Zielinski 1 , Augustin Žídek 1 , Alex Bridgland 1 , Andrew Cowie 1 , Clemens Meyer 1 , Agata Laydon 1 , Sameer Velankar 2 , Gerard J Kleywegt 2 , Alex Bateman 2 , Richard Evans 1 , Alexander Pritzel 1 , Michael Figurnov 1 , Olaf Ronneberger 1 , Russ Bates 1 , Simon A A Kohl 1 , Anna Potapenko 1 , Andrew J Ballard 1 , Bernardino Romera-Paredes 1 , Stanislav Nikolov 1 , Rishub Jain 1 , Ellen Clancy 1 , David Reiman 1 , Stig Petersen 1 , Andrew W Senior 1 , Koray Kavukcuoglu 1 , Ewan Birney 2 , Pushmeet Kohli 1 , John Jumper 1 , Demis Hassabis 1
Affiliation  

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.



中文翻译:

人类蛋白质组的高精度蛋白质结构预测

蛋白质结构可以提供宝贵的信息,既可以用于推理生物过程,也可以用于实现干预,例如基于结构的药物开发或靶向诱变。经过数十年的努力,人类蛋白质序列中 17% 的总残基被实验确定的结构1覆盖。在这里,我们通过应用最先进的机器学习方法 AlphaFold 2显着扩展了蛋白质组的结构覆盖范围,其规模几乎涵盖了整个人类蛋白质组(98.5% 的人类蛋白质)。生成的数据集覆盖了 58% 的具有置信预测的残基,其中一个子集(所有残基的 36%)具有非常高的置信度。我们介绍了基于 AlphaFold 模型开发的几个指标,并使用它们来解释数据集,识别强大的多域预测以及可能无序的区域。最后,我们提供了一些案例研究来说明如何使用高质量的预测来产生生物学假设。我们正在向社区免费提供我们的预测,并预计常规的大规模和高精度结构预测将成为一种重要工具,可以从结构的角度解决新问题。

更新日期:2021-07-22
down
wechat
bug