Mini review
From sequence to function through structure: Deep learning for protein design

https://doi.org/10.1016/j.csbj.2022.11.014Get rights and content
Under a Creative Commons license
open access

Abstract

The process of designing biomolecules, in particular proteins, is witnessing a rapid change in available tooling and approaches, moving from design through physicochemical force fields, to producing plausible, complex sequences fast via end-to-end differentiable statistical models. To achieve conditional and controllable protein design, researchers at the interface of artificial intelligence and biology leverage advances in natural language processing (NLP) and computer vision techniques, coupled with advances in computing hardware to learn patterns from growing biological databases, curated annotations thereof, or both. Once learned, these patterns can be used to provide novel insights into mechanistic biology and the design of biomolecules. However, navigating and understanding the practical applications for the many recent protein design tools is complex. To facilitate this, we 1) document recent advances in deep learning (DL) assisted protein design from the last three years, 2) present a practical pipeline that allows to go from de novo-generated sequences to their predicted properties and web-powered visualization within minutes, and 3) leverage it to suggest a generated protein sequence which might be used to engineer a biosynthetic gene cluster to produce a molecular glue-like compound. Lastly, we discuss challenges and highlight opportunities for the protein design field.

Abbreviations

ADMM
Alternating Direction Method of Multipliers
CNN
Convolutional Neural Network
DL
Deep learning
FNN
fully-connected neural network
GAN
Generative Adversarial Network
GCN
Graph Convolutional Network
GNN
Graph Neural Network
GO
Gene Ontology
GVP
Geometric Vector Perceptron
LSTM
Long-Short Term Memory
MLP
Multilayer Perceptron
MSA
Multiple Sequence Alignment
NLP
Natural Language Processing
NSR
Natural Sequence Recovery
pLM
protein Language Model
VAE
Variational Autoencoder

Keywords

Protein design
Protein prediction
Drug discovery
Deep learning
Protein language models

Cited by (0)