当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Expanding functional protein sequence spaces using generative adversarial networks
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2021-03-04 , DOI: 10.1038/s42256-021-00310-5
Donatas Repecka , Vykintas Jauniskis , Laurynas Karpus , Elzbieta Rembeza , Irmantas Rokaitis , Jan Zrimec , Simona Poviloniene , Audrius Laurynenas , Sandra Viknander , Wissam Abuajwa , Otto Savolainen , Rolandas Meskys , Martin K. M. Engqvist , Aleksej Zelezniak

De novo protein design for catalysis of any desired chemical reaction is a long-standing goal in protein engineering because of the broad spectrum of technological, scientific and medical applications. However, mapping protein sequence to protein function is currently neither computationally nor experimentally tangible. Here, we develop ProteinGAN, a self-attention-based variant of the generative adversarial network that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino-acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase (MDH) as a template enzyme, we show that 24% (13 out of 55 tested) of the ProteinGAN-generated and experimentally tested sequences are soluble and display MDH catalytic activity in the tested conditions in vitro, including a highly mutated variant of 106 amino-acid substitutions. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse functional proteins within the allowed biological constraints of the sequence space.



中文翻译:

使用生成对抗网络扩展功能蛋白质序列空间

由于广泛的技术、科学和医学应用,从头设计用于催化任何所需化学反应的蛋白质是蛋白质工程的长期目标。然而,将蛋白质序列映射到蛋白质功能目前既不是计算也不是实验有形的。在这里,我们开发了 ProteinGAN,这是一种生成对抗网络的基于自我注意的变体,能够“学习”天然蛋白质序列多样性并能够生成功能性蛋白质序列。ProteinGAN 直接从复杂的多维氨基酸序列空间中学习蛋白质序列的进化关系,并创建具有自然物理特性的高度多样化的新序列变体。使用苹果酸脱氢酶 (MDH) 作为模板酶,我们表明,24%(55 个测试中的 13 个)ProteinGAN 生成和实验测试的序列是可溶的,并在体外测试条件下显示 MDH 催化活性,包括 106 个氨基酸取代的高度突变变体。因此,ProteinGAN 展示了人工智能在序列空间允许的生物学限制内快速生成高度多样化的功能蛋白质的潜力。

更新日期:2021-03-04
down
wechat
bug