Abstract
Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This ‘central dogma’ underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Chothia, C. Principles that determine the structure of proteins. Annu. Rev. Biochem. 53, 537–572 (1984).
Korendovych, I. V. & DeGrado, W. F. De novo protein design, a retrospective. Q. Rev. Biophys. 53, e3 (2020).
Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Baker, D. What has de novo protein design taught us about protein folding and biophysics? Protein Sci. 28, 678–683 (2019).
DeGrado, W. F., Summa, C. M., Pavone, V., Nastri, F. & Lombardi, A. De novo design and structural characterization of proteins and metalloproteins. Annu. Rev. Biochem. 68, 779–819 (1999).
Regan, L. & DeGrado, W. F. Characterization of a helical protein designed from first principles. Science 241, 976–978 (1988).
Harbury, P. B., Plecs, J. J., Tidor, B., Alber, T. & Kim, P. S. High-resolution protein design with backbone freedom. Science 282, 1462–1467 (1998).
Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997).
Dahiyat, B. I. & Mayo, S. L. Protein design automation. Protein Sci. 5, 895–903 (1996).
Walsh, S. T. R., Cheng, H., Bryson, J. W., Roder, H. & DeGrado, W. F. Solution structure and dynamics of a de novo designed three-helix bundle protein. Proc. Natl Acad. Sci. USA 96, 5486–5491 (1999).
Levinthal, C. Are there pathways for protein folding? J. Chim. Phys. 65, 44–45 (1968).
Maynard Smith, J. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970).
Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).
Gront, D., Kulp, D. W., Vernon, R. M., Strauss, C. E. M. & Baker, D. Generalized fragment picking in Rosetta: design, protocols and applications. PLoS ONE 6, e23294 (2011).
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Shapovalov, M. V. & Dunbrack, R. L. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19, 844–858 (2011).
Leman, J. K. et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 17, 665–680 (2020).
Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).
Vorobieva, A. A. et al. De novo design of transmembrane β barrels. Science 371, eabc8182 (2021).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Preprint at bioRxiv https://doi.org/10.1101/2023.10.09.561603 (2023).
Sheffler, W. et al. Fast and versatile sequence-independent protein docking for nanomaterials design using RPXDock. PLoS Comput. Biol. 19, e1010680 (2023).
Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).
Lin, Y. & Alquraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. In Proceedings of the 40th International Conference on Machine Learning (eds. Krause, A. et al.) Vol. 202, 20978–21002 (PMLR, 2023); https://proceedings.mlr.press/v202/lin23a.html
Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15611 (2022).
Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proceedings of the 40th International Conference on Machine Learning (eds. Krause, A. et al.) Vol. 202, 40001–40039 (PMLR, 2023); https://proceedings.mlr.press/v202/yim23a.html
Bose, J. A. et al. SE(3)-stochastic flow matching for protein backbone generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.02391 (2024).
Yim, J. et al. Fast protein backbone generation with SE(3) flow matching. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.05297 (2023).
Fu, C. et al. A latent diffusion model for protein structure generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2305.04120 (2023).
Liu, Y., Chen, L. & Liu, H. Diffusion in a quantized vector space generates non-idealized protein structures and predicts conformational distributions. Preprint at arXiv https://doi.org/10.1101/2023.11.18.567666 (2023).
Strokach, A., Becerra, D., Corbi-Verge, C., Perez-Riba, A. & Kim, P. M. Fast and flexible protein design using deep graph neural networks. Cell Syst. 11, 402–411 (2020).
Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) Vol. 32. (Curran Associates, 2019); https://proceedings.neurips.cc/paper_files/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf
Gao, Z. et al. PiFold: toward effective and efficient protein inverse folding. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.12643 (2022).
Yi, K. et al. Graph denoising diffusion for inverse protein folding. Preprint at arXiv https://doi.org/10.48550/arXiv.2306.16819 (2023).
Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proceedings of the 39th International Conference on Machine Learning (eds. Chaudhuri, K. et al.) Vol. 162, 8946–8970 (PMLR, 2022); https://proceedings.mlr.press/v162/hsu22a.html
Xiong, P. et al. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 36, 136–144 (2020).
Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).
Heinzinger, M. et al. ProstT5: bilingual language model for protein sequence and structure. Preprint at bioRxiv https://doi.org/10.1101/2023.07.23.550085 (2023).
Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. Preprint at bioRxiv https://doi.org/10.1101/2023.10.01.560349 (2023).
Gruver, N. et al. Protein design with guided discrete diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2305.20009 (2023).
Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3, 324–333 (2021).
Greener, J. G., Moffat, L. & Jones, D. T. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8, 16189 (2018).
Jin, W., Wohlwend, J., Barzilay, R. & Jaakkola, T. Iterative refinement graph neural network for antibody sequence–structure co-design. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.04624 (2022).
Martinkus, K. et al. AbDiffuser: full-atom generation of in-vitro functioning antibodies. Preprint at arXiv https://doi.org/10.48550/arXiv.2308.05027 (NeurIPS, 2023).
Luo, S. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In Advances in Neural Information Processing Systems (eds. Koyejo, S. et al.) Vol. 35, 9754–9767 (Curran Associates, Inc., 2022); https://proceedings.neurips.cc/paper_files/paper/2022/file/3fa7d76a0dc1179f1e98d1bc62403756-Paper-Conference.pdf
Davison, J. Zero-shot learning in modern NLP. Joe Davison Blog joeddav.github.io/blog/2020/05/29/ZSL.html (2020).
Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems (eds. Bengio, S. et al.) Vol. 31 (Curran Associates, 2018); https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf
Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M. & Le, M. Flow matching for generative modeling. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.02747 (2023).
Liu, X., Gong, C. & Liu, Q. Flow straight and fast: learning to generate and transfer data with rectified flow. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.03003 (2022).
Albergo, M. S., Boffi, N. M. & Vanden-Eijnden, E. Stochastic interpolants: a unifying framework for flows and diffusions. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.08797 (2023).
Somnath, V. R. et al. Aligned diffusion Schrödinger bridges. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.11419 (2023).
Conte, L. L., Chothia, C. & Janin, J. The atomic structure of protein–protein recognition sites. J. Mol. Biol. 285, 2177–2198 (1999).
Woolfson, D. N. A brief history of de novo protein design: minimal, rational, and computational. J. Mol. Biol. 433, 167160 (2021).
Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368, eaay5051 (2020).
Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).
Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).
Scott, A. J. et al. Constructing ion channels from water-soluble α-helical barrels. Nat. Chem. 13, 643–650 (2021).
Mahendran, K. R. et al. A monodisperse transmembrane α-helical peptide barrel. Nat. Chem. 9, 411–419 (2017).
Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).
Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).
Eguchi, R. R. et al. Deep generative design of epitope-specific binding proteins by latent conformation optimization. Preprint at bioRxiv https://doi.org/10.1101/2022.12.22.521698 (2022).
Glasscock, C. J. et al. Computational design of sequence-specific DNA-binding proteins. Preprint at bioRxiv https://doi.org/10.1101/2023.09.20.558720 (2023).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).
Torres, S. V. et al. De novo design of high-affinity binders of bioactive helical peptides. Nature https://doi.org/10.1038/s41586-023-06953-1 (2023).
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
Chu, A. E., Fernandez, D., Liu, J., Eguchi, R. R. & Huang, P.-S. De novo design of a highly stable ovoid TIM barrel: unlocking pocket shape towards functional design. Biodes. Res. 2022, 9842315 (2022).
Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011).
Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol. 25, 1028–1034 (2018).
Huang, P.-S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).
Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).
Winnifrith, A., Outeiral, C. & Hie, B. Generative artificial intelligence for de novo protein design. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.09685 (2023).
Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Baek, M. et al. Efficient and accurate prediction of protein structure using RoseTTAFold2. Preprint at bioRxiv https://doi.org/10.1101/2023.05.24.542179 (2023).
Frank, C. et al. Efficient and scalable de novo protein design using a relaxed sequence space. Preprint at bioRxiv https://doi.org/10.1101/2023.02.24.529906 (2023).
Tischer, D. et al. Design of proteins presenting discontinuous functional sites using deep learning. Preprint at bioRxiv https://doi.org/10.1101/2020.11.29.402743 (2020).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).
Radford, A., Metz, L. & Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint at arXiv https://doi.org/10.48550/arXiv.1511.06434 (2015).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6114 (2013).
Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 4401–4410 (IEEE, 2018).
Anand, N. & Huang, P. Generative modeling for protein structures. In Advances in Neural Information Processing Systems (eds. Bengio, S. et al.) Vol. 31 (Curran Associates, 2018); https://proceedings.neurips.cc/paper_files/paper/2018/file/afa299a4d1d8c52e75dd8a24c3ce534f-Paper.pdf
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (eds. Larochelle, H. et al.) Vol. 33, 6840–6851 (Curran Associates, 2020); https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
Song, Y. et al. Score-based generative modeling through stochastic differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.13456 (2021).
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems (eds. Ranzato, M. et al.) Vol. 34, 8780–8794 (Curran Associates, 2021); https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15019 (2022).
Li, C. T. & Farnia, F. Mode-seeking divergences: theory and applications to GANs. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (eds. Ruiz, F., Dy, J. & van de Meent, J.-W.) Vol. 206, 8321–8350 (PMLR, 2023); https://proceedings.mlr.press/v206/ting-li23a.html
Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Comput. Sci. 3, 382–392 (2023).
Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).
Chu, A. E., Cheng, L., Nesr, G. E., Xu, M. & Huang, P.-S. An all-atom protein generative model. Preprint at bioRxiv https://doi.org/10.1101/2023.05.24.542194 (2023).
Basanta, B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl Acad. Sci. USA 117, 22135–22145 (2020).
Mravic, M. et al. Packing of apolar side chains enables accurate design of highly stable membrane proteins. Science 363, 1418–1423 (2019).
Sumida, K. H. et al. Improving protein expression, stability, and function with ProteinMPNN. Preprint at bioRxiv https://doi.org/10.1101/2023.10.03.560713 (2023).
Koga, R. et al. Robust folding of a de novo designed ideal protein even with most of the core mutated to valine. Proc. Natl Acad. Sci. USA 117, 31149–31156 (2020).
Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl Acad. Sci. USA 118, e2017228118 (2021).
Goverde, C. A., Wolf, B., Khakzad, H., Rosset, S. & Correia, B. E. De novo protein design by inversion of the AlphaFold structure prediction network. Protein Sci. 32, e4653 (2023).
Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).
Yang, K.K., Zanichelli, N. & Yeh, H. Masked inverse folding with sequence transfer for protein representation learning. Protein Eng. Des. Sel. 36, gzad015 (2023).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Jeliazkov, J. R., Alamo, Ddel & Karpiak, J. D. ESMFold hallucinates native-like protein sequences. In NeurIPS Workshop on Machine Learning in Structural Biology. Preprint at bioRxiv https://doi.org/10.1101/2023.05.23.541774 (2023).
Rettie, S. A. et al. Cyclic peptide structure prediction and design using AlphaFold. Preprint at bioRxiv https://doi.org/10.1101/2023.02.25.529956 (2023).
Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022).
Gazizov, A., Lian, A., Goverde, C., Ovchinnikov, S. & Polizzi, N. F. AF2BIND: predicting ligand-binding sites using the pair representation of AlphaFold2. Preprint at bioRxiv https://doi.org/10.1101/2023.10.15.562410 (2023).
Fleishman, S. J. & Baker, D. Role of the biomolecular energy gap in protein design, structure, and evolution. Cell 149, 262–273 (2012).
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
Nijkamp, E., Ruffolo, J., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 (2023).
Ferruz, N., Schmidt, S. & Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 13, 4348 (2022).
Alamdari, S. et al. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at bioRxiv https://doi.org/10.1101/2023.09.11.556673 (2023).
Kamisetty, H., Ovchinnikov, S. & Baker, D. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era. Proc. Natl Acad. Sci. USA 110, 15674–15679 (2013).
Rao, R. et al. Evaluating protein transfer learning with TAPE. In Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) Vol. 32 (Curran Associates, 2019); https://proceedings.neurips.cc/paper_files/paper/2019/file/37f65c068b7723cd7809ee2d31d7861c-Paper.pdf
Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2006.15222 (2021).
Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669 (2021).
Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).
Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Hie, B. et al. A high-level programming language for generative protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521526 (2022).
Mackenzie, C. O., Zhou, J. & Grigoryan, G. Tertiary alphabet for the observable protein structural universe. Proc. Natl Acad. Sci. USA 113, E7438–E7447 (2016).
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
Shin, J. E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).
Brookes, D., Park, H. & Listgarten, J. Conditioning by adaptive sampling for robust design. In Proceedings of the 36th International Conference on Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) Vol. 97, 773–782 (PMLR, 2019); https://proceedings.mlr.press/v97/brookes19a.html
Lisanza, S. L. et al. Joint generation of protein sequence and structure with RoseTTAFold sequence space diffusion. Preprint at bioRxiv https://doi.org/10.1101/2023.05.08.539766 (2023).
Langan, R. A. et al. De novo design of bioactive protein switches. Nature 572, 205–210 (2019).
Praetorius, F. et al. Design of stimulus-responsive two-state hinge proteins. Science 381, 754–760 (2023).
Wei, K. Y. et al. Computational design of closely related proteins that adopt two well-defined but structurally divergent folds. Proc. Natl Acad. Sci. USA 117, 7208–7215 (2020).
St-Jacques, A. D. et al. Computational remodeling of an enzyme conformational landscape for altered substrate selectivity. Nat. Commun. 14, 6058 (2023).
Pesce, F. et al. Design of intrinsically disordered protein variants with diverse structural properties. Preprint at bioRxiv https://doi.org/10.1101/2023.10.22.563461 (2023).
Leaver-Fay, A., Jacak, R., Stranges, P. B. & Kuhlman, B. A generic program for multistate protein design. PLoS ONE 6, e20937 (2011).
Wankowicz, S. A. et al. Uncovering protein ensembles: automated multiconformer model building for X-ray crystallography and cryo-EM. Preprint at bioRxiv https://doi.org/10.1101/2023.06.28.546963 (2023).
Kim, J., McFee, M., Fang, Q., Abdin, O. & Kim, P. M. Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol. Sci. 44, 175–189 (2023).
North, B., Lehmann, A. & Dunbrack, R. L. A new clustering of antibody CDR loop conformations. J. Mol. Biol. 406, 228–256 (2011).
Raybould, M. I. et al. Five computational developability guidelines for therapeutic antibody profiling. Proc. Natl Acad. Sci. USA 116, 4025–4030 (2019).
Lipsh-Sokolik, R. et al. Combinatorial assembly and design of enzymes. Science 379, 195–201 (2023).
Yeh, A. H. W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).
Jing, B. et al. EigenFold: generative protein structure prediction with diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2304.02198 (2023).
Zheng, S. et al. Towards predicting equilibrium distributions for molecular systems with deep learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2306.05445 (2023).
Abdin, O. & Kim, P. M. PepFlow: direct conformational sampling from peptide energy landscapes through hypernetwork-conditioned diffusion. Preprint at bioRxiv https://doi.org/10.1101/2023.06.25.546443 (2023).
Wallner, B. AFsample: improving multimer prediction with AlphaFold using massive sampling. Bioinformatics 39, btad573 (2023).
Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature https://doi.org/10.1038/s41586-023-06832-9 (2023).
Khakzad, H. et al. A new age in protein design empowered by deep learning. Cell Syst. 14, 925–939 (2023).
Minami, S. et al. Exploration of novel αβ-protein folds through de novo design. Nat. Struct. Mol. Biol. 30, 1132–1140 (2023).
Bonet, J. et al. Rosetta FunFolDes — a general framework for the computational design of functional proteins. PLoS Comput. Biol. 14, e1006623 (2018).
Dieleman, S. Diffusion Models are Autoencoders https://sander.ai/2022/01/31/diffusion.html (2022).
Boyken, S. E. et al. De novo design of tunable, pH-driven conformational changes. Science 364, 658–664 (2019).
Bethel, N. P. et al. Precisely patterned nanofibres made from extendable protein multiplexes. Nat. Chem. 15, 1664–1671 (2023).
Kurihara, K. et al. Crystal structure and activity of a de novo enzyme, ferric enterobactin esterase Syn-F4. Proc. Natl Acad. Sci. USA 120, e2218281120 (2023).
Naudin, E. A. et al. Acyl transfer catalytic activity in de novo designed protein with N-terminus of α-helix as oxyanion-binding site. J. Am. Chem. Soc. 143, 3330–3339 (2021).
Mulligan, V. K. et al. Computational design of mixed chirality peptide macrocycles with internal symmetry. Protein Sci. 29, 2433–2445 (2020).
Acknowledgements
We thank S. Ovchinnikov for feedback on the manuscript. For readers interested in more depth on physics-based modeling approaches, such as Rosetta, we recommend other reviews2,3,17. We also recommend two recent reviews with related perspectives, focusing more on the details and impact of machine learning on protein engineering and design, especially on direct sequence modeling73,145. A.E.C. is supported by the NSF GRFP and the Merck SEEDS Program. T.L. is supported by a Stanford Graduate Fellowship. P.-S.H. is supported by the NIH (R01GM147893), the American Cancer Society (ACS 134055-IRG-218), the BASF CARA project and the Discovery Innovation Fund.
Author information
Authors and Affiliations
Contributions
Planning, figure production and writing of the Review: all authors. Reference curation: A.E.C. and T.L. Supplementary tables: A.E.C. and T.L.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review information
Nature Biotechnology thanks Philip Kim and Kevin Yang for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Tables 1 and 2
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chu, A.E., Lu, T. & Huang, PS. Sparks of function by de novo protein design. Nat Biotechnol 42, 203–215 (2024). https://doi.org/10.1038/s41587-024-02133-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-024-02133-2
This article is cited by
-
Spotlight on protein structure design
Nature Biotechnology (2024)