当前位置: X-MOL 学术Biochemistry › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Harnessing Human Neural Networks for Protein Design.
Biochemistry ( IF 2.9 ) Pub Date : 2019-12-13 , DOI: 10.1021/acs.biochem.9b00820
Po-Ssu Huang 1 , Kirsten A Thompson 1
Affiliation  

Creating proteins to fulfill a variety of functional needs has long been a goal of biochemists. This requires a thorough understanding of the relationship between the sequence of a polypeptide chain and the resulting protein structure. In recent years, the field of protein design has finally reached a stage where it is possbile to use physical and chemical principles to guild the design of novel protein structures. The goal of designing a protein structure is to produce an amino acid sequence that can fold into a target shape. To compute the sequence, most current methods explicitly model every atom in the system (with implicit solvent) to find a configuration that satisfies all of the interactions that each residue can make in its environment. While we are not yet capabale of using these methods to design proteins with any arbitrary function, our ability to create structures that significantly differ from those observed in structural databases has reached new heights. Protein design has become more robust with recent advances in computer processing power, design algorithms, and the decreased cost with DNA synthesis. These breakthroughs have provided the tools to run large-scale simulations, test design hypotheses, and experimentally iterate on and confirm designs. Nonetheless, the word “design” implies the involvement of cognitive activity in determining the outcome. This is arguably the most critical and least tractable element of the approach. Although any new amino acid sequence that can be generated rationally for a protein can be considered a design, in recent years, the meaning of designing a protein “de novo” has referred largely to designs in which both the structure and the sequence are modeled and created from scratch. When both the backbone and sequence are unknown at the onset, a protein designer must creatively choose a topology and construct the proper structural elements to form the backbone. A number of strategies to restrict the local backbone geometries to be native-like have been employed, for example, by borrowing true fragments from actual proteins to initiate the construction or extensively idealizing the peptide chain according to reliable chemical knowledge or parametric equations. While computer algorithms have largely automated specific steps of protein design, the protein designer still controls the process and makes certain that the resulting structures are coherent. But what decisions do human designers make that today’s automated algorithms do not? This question prompted the development of Foldit, a video game that applies a graphical user interface to the protein modeling suite Rosetta. In addition to serving as an excellent educational tool, Foldit aims to explore the strategies humans use to solve protein structure puzzles in hopes that these operations can be analyzed to improve or automate design algorithms. Foldit began with puzzles that challenged players to predict the folds of natural amino acid sequences (Figure 1A). Recently, it has been extended to allow players to modify previously designed proteins or design novel proteins from scratch (Figure 1B,C). Figure 1. Foldit examples and the design steps. Foldit players have successfully completed various protein structure prediction and design problems, including (A) aiding structure prediction of Mason-Pfizer monkey virus (M-PMV),(1) (B) increasing Diehls-alderase activity by redesigning the loops around the active site of a previously designed enzyme,(2) and (C) designing a new structure from scratch. (D) The Foldit interface enables players to visualize the “score” of their designed structures as they perturb the protein backbone and (E) modify the amino acid sequence of the design. There are three main components involved in the design of proteins: scoring metrics to guide the moves, strategies to change the structure, and sequence tweaks to improve models (Figure 1D,E). In Foldit, the latter two are controlled by human players. There is little difference between what a player may do compared to what a trained protein designer might because their objectives are the same: to follow the score provided by the force field as it is not possible to mentally follow the entire system of thousands of atoms. In a study published in Nature, Koepnick et al. let Foldit players design a folded peptide starting from a linear chain.(3) Players were exceptionally good at exploring the conformational space, as seen in an early iteration of the game, where the players’ structures were truly novel and expressive. While many of these creative models would not likely fold to their target structures, the real implication of the crowd sourcing brilliance is that now every aspect of the Rosetta scoring function is being tested, and exploited, in unintended ways by the players to achieve a better score. Fixing scoring deficits identified by players will eventually make the scoring metric more robust. Indeed, in subsequent rounds, Foldit was configured to enforce packing and backbone regularization rules; remarkably, these improvements provided sound guidance, and the citizen scientists were able to design proteins at the same level of accuracy as expert designers who are trained in structural biology. Perhaps not surprisingly, with the imposition of build rules, the models produced in Foldit are no longer shockingly different from designs that trained experts have long been able to produce. However, for nonscientists to achieve these novel designs by simply maximizing the game score, the Foldit experiment shows that the scoring scheme (i.e., the Rosetta force field) must be remarkably robust. By specifying the secondary structure content required or other more general rules, the scientists behind Foldit also seem to be able to guide players into creating a wide variety of structures within specific folds. The quality of the models seems only as good as the rules set by the scientists.(4) It will be fascinating to see how this interplay between knowledge-derived rules and human creativity can be harnessed to advance science. Automated computer algorithms today cannot carry out the design tasks like the human Foldit players; the calculations would take far too long to sample to produce a viable structure without human guidance. How do we leverage these impressive results from citizen scientists to improve design algorithms? Crowdsourcing the “human neural network” by Foldit games can efficiently sample the protein fold space and expose flaws in the simulation; a highly refined yet robust scoring scheme might also help in improving artificial neural networks to the task, as the field has started to pay attention to these new approaches.(5) Foldit will continue to push the field toward solving complex modeling problems with creative human solutions, creating paths where new algorithms may follow. However, with machine learning agents beating humans in Go, chess, and video games, it would not come as a surprise if someday computers also beat humans at protein design. P.-S.H. and K.A.T. are supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery Through Advanced Computing (SciDAC) program, and the Schools of Medicine and Engineering of Stanford University. The authors declare no competing financial interest. The authors thank Brian Koepnick, Ian Haydon, Paul Nuyujukian, and Ross Venook for suggestions and feedback. This article references 5 other publications.

中文翻译:

利用人类神经网络进行蛋白质设计。

长期以来,创造满足各种功能需求的蛋白质一直是生物化学家的目标。这需要对多肽链的序列与所得蛋白质结构之间的关系有透彻的了解。近年来,蛋白质设计领域终于达到了可以利用物理和化学原理指导新型蛋白质结构设计的阶段。设计蛋白质结构的目的是产生可以折叠成目标形状的氨基酸序列。为了计算序列,大多数当前方法对系统中的每个原子(使用隐式溶剂)进行显式建模,以找到满足每个残基可以在其环境中进行的所有相互作用的构型。尽管我们还没有能力使用这些方法来设计具有任意功能的蛋白质,我们创建与结构数据库中观察到的结构明显不同的结构的能力达到了新的高度。随着计算机处理能力,设计算法的最新进展以及DNA合成成本的降低,蛋白质设计变得更加强大。这些突破提供了运行大规模仿真,测试设计假设以及实验迭代和确认设计的工具。但是,“设计”一词意味着认知活动参与确定结果。可以说,这是该方法中最关键,最难处理的要素。尽管可以合理地生成蛋白质的任何新氨基酸序列都可以视为一种设计,但近年来,“从头设计”蛋白质的含义在很大程度上指的是从头开始对结构和序列进行建模和创建的设计。当主链和序列在开始时都不知道时,蛋白质设计人员必须创造性地选择一种拓扑结构,并构建适当的结构元件以形成主链。例如,通过根据可靠的化学知识或参数方程从真实蛋白质中借用真实片段来启动构建或广泛地理想化肽链,已采用了多种策略来将局部主链的几何形状限制为天然的。尽管计算机算法在很大程度上自动执行了蛋白质设计的特定步骤,但蛋白质设计人员仍然可以控制该过程并确保生成的结构是连贯的。但是,人类设计人员会做出什么决定,而当今的自动化算法却没有呢?这个问题促使Foldit的开发,这是一款将图形用户界面应用于蛋白质建模套件Rosetta的视频游戏。除了充当出色的教育工具外,Foldit旨在探索人类用于解决蛋白质结构难题的策略,希望可以对这些操作进行分析以改善或自动化设计算法。Foldit从难题开始,这些难题挑战着玩家来预测天然氨基酸序列的折叠(图1A)。最近,它已经扩展到允许玩家修改以前设计的蛋白质或从头开始设计新蛋白质的方式(图1B,C)。图1. Foldit示例和设计步骤。Foldit参与者已经成功完成了各种蛋白质结构预测和设计问题,包括(A)协助预测Mason-Pfizer猴病毒(M-PMV)的结构,(1)(B)通过重新设计活性分子周围的环来提高Diehls-醛糖酶活性(2)和(C)从头开始设计新的结构。(D)Foldit界面使玩家能够在干扰蛋白质主链时可视化其设计结构的“分数”,以及(E)修改设计的氨基酸序列。蛋白质设计涉及三个主要部分:评分指标以指导蛋白质的移动,改变结构的策略以及序列调整以改进模型(图1D,E)。在Foldit中,后两者是由人类玩家控制的。相比于受过训练的蛋白质设计者而言,参与者的行为与经过训练的蛋白质设计者的行为之间几乎没有什么区别,因为他们的目标是相同的:遵循力场提供的分数,因为不可能从心理上遵循整个成千上万个原子的系统。在发表于自然,Koepnick等。让Foldit玩家从线性链开始设计折叠的肽段。(3)玩家非常善于探索构象空间,如游戏的早期迭代所示,其中玩家的结构真正新颖且富有表现力。尽管这些创意模型中的许多模型可能不会折叠到它们的目标结构上,但众包才华的真正含义是现在玩家正在以一种意想不到的方式测试和利用Rosetta评分功能的各个方面,以实现更好的效果。分数。修正玩家确定的得分缺陷,最终将使得分指标更加可靠。确实,在随后的回合中,Foldit被配置为强制执行打包和主干正则化规则。值得注意的是,这些改进提供了合理的指导,公民科学家能够以与受过结构生物学训练的专家设计者相同的精确度设计蛋白质。也许并不奇怪,由于采用了构建规则,在Foldit中生产的模型与受过训练的专家长期能够生产的设计不再产生令人震惊的不同。但是,对于非科学家通过简单地最大化游戏得分来实现这些新颖的设计,Foldit实验表明,计分方案(即Rosetta力场)必须具有显着的鲁棒性。通过指定所需的二级结构内容或其他更通用的规则,Foldit背后的科学家似乎也能够引导玩家在特定的褶皱中创建各种各样的结构。模型的质量似乎仅与科学家设定的规则一样好。(4)有趣的是,如何利用知识衍生的规则和人类创造力之间的这种相互作用来促进科学发展。如今,自动化的计算机算法无法像人类Foldit播放器那样执行设计任务。如果要在没有人工指导的情况下建立可行的结构,计算将花费很长时间。我们如何利用公民科学家的令人印象深刻的结果来改进设计算法?通过Foldit游戏将“人类神经网络”众包可以有效地采样蛋白质折叠空间并暴露模拟中的缺陷;由于该领域已开始关注这些新方法,因此,高度精细但鲁棒的评分方案也可能有助于改进人工神经网络来完成该任务。(5)Foldit将继续推动该领域朝着用创造性的人类解决方案解决复杂建模问题的方向发展,并为新算法的发展创造条件。但是,随着机器学习代理在围棋,国际象棋和视频游戏中击败人类,如果有朝一日计算机在蛋白质设计上也能击败人类,就不足为奇了。P.-SH和KAT得到了美国能源部,科学办公室,高级科学计算研究办公室,通过高级计算进行科学发现(SciDAC)计划以及斯坦福大学医学与工程学院的支持。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。创建新算法可能遵循的路径。但是,随着机器学习代理在围棋,国际象棋和视频游戏中击败人类,如果有朝一日计算机在蛋白质设计上也能击败人类,就不足为奇了。P.-SH和KAT得到了美国能源部,科学办公室,高级科学计算研究办公室,通过高级计算进行科学发现(SciDAC)计划以及斯坦福大学医学与工程学院的支持。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。创建新算法可能遵循的路径。但是,随着机器学习代理在围棋,国际象棋和视频游戏中击败人类,如果有朝一日计算机在蛋白质设计上也能击败人类,就不足为奇了。P.-SH和KAT得到了美国能源部,科学办公室,高级科学计算研究办公室,通过高级计算进行科学发现(SciDAC)计划以及斯坦福大学医学与工程学院的支持。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。如果有朝一日,计算机在蛋白质设计上也击败了人类,这也就不足为奇了。P.-SH和KAT得到了美国能源部,科学办公室,高级科学计算研究办公室,通过高级计算进行科学发现(SciDAC)计划以及斯坦福大学医学与工程学院的支持。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。如果有朝一日,计算机在蛋白质设计上也击败了人类,这并不令人感到惊讶。P.-SH和KAT得到了美国能源部,科学办公室,高级科学计算研究办公室,通过高级计算进行科学发现(SciDAC)计划以及斯坦福大学医学与工程学院的支持。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。以及斯坦福大学医学与工程学院。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。以及斯坦福大学医学与工程学院。作者宣称没有竞争性的经济利益。作者感谢Brian Koepnick,Ian Haydon,Paul Nuyujukian和Ross Venook的建议和反馈。本文引用了其他5个出版物。
更新日期:2019-12-17
down
wechat
bug