当前位置: X-MOL 学术bioRxiv. Synth. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Low-N protein engineering with data-efficient deep learning
bioRxiv - Synthetic Biology Pub Date : 2020-08-31 , DOI: 10.1101/2020.01.23.917682
Surojit Biswas , Grigory Khimulya , Ethan C. Alley , Kevin M. Esvelt , George M. Church

Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via ​in silico ​directed evolution. As demonstrated in two highly dissimilar proteins, avGFP and TEM-1 ​β​-lactamase, top candidates from a single round are diverse and as active as engineered mutants obtained from previous multi-year, high-throughput efforts. Because it distills information from both global and local sequence landscapes, our model approximates protein function even before receiving experimental data, and generalizes from only single mutations to propose high-functioning epistatically non-trivial designs. With reproducible >500% improvements in activity from a single assay in a 96-well plate, we demonstrate the strongest generalization observed in machine-learning guided protein function optimization to date. Taken together, our approach enables efficient use of resource intensive high-fidelity assays without sacrificing throughput, and helps to accelerate engineered proteins into the fermenter, field, and clinic.

中文翻译:

具有数据有效深度学习的低氮蛋白质工程

蛋白质工程具有巨大的学术和工业潜力。然而,由于缺乏与设计目标相一致的实验检测方法以及足够高的通量来发现稀有的,增强的变异体,因此受到限制。在这里,我们介绍了一种机器学习指导的范例,该范例可以使用多达24个经过功能分析的突变序列来构建准确的虚拟适应度景观,并通过计算机定向进化筛选一千万个序列。正如在两种高度不同的蛋白avGFP和TEM-1β-内酰胺酶中所证明的那样,单个回合的顶级候选物是多样的,并且与从以前的多年,高通量研究中获得的工程突变体一样活跃。由于它从全局序列图和局部序列图中提取信息,因此我们的模型甚至在接收实验数据之前就可以近似蛋白质功能,并仅从单个突变进行概括,以提出功能强大的上位非平凡设计。通过在96孔板中进行的单个测定,可再现的活性提高> 500%,我们证明了迄今为止在机器学习指导的蛋白质功能优化中观察到的最强概括性。综上所述,我们的方法可在不牺牲通量的情况下有效利用资源密集的高保真测定,并有助于将工程蛋白加速进入发酵罐,田间和临床。
更新日期:2020-09-01
down
wechat
bug