当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting mutant outcome by combining deep mutational scanning and machine learning
Proteins: Structure, Function, and Bioinformatics ( IF 2.9 ) Pub Date : 2021-07-22 , DOI: 10.1002/prot.26184
Hagit Sarfati 1 , Si Naftaly 2 , Niv Papo 2 , Chen Keasar 1
Affiliation  

Deep mutational scanning provides unprecedented wealth of quantitative data regarding the functional outcome of mutations in proteins. A single experiment may measure properties (eg, structural stability) of numerous protein variants. Leveraging the experimental data to gain insights about unexplored regions of the mutational landscape is a major computational challenge. Such insights may facilitate further experimental work and accelerate the development of novel protein variants with beneficial therapeutic or industrially relevant properties. Here we present a novel, machine learning approach for the prediction of functional mutation outcome in the context of deep mutational screens. Using sequence (one-hot) features of variants with known properties, as well as structural features derived from models thereof, we train predictive statistical models to estimate the unknown properties of other variants. The utility of the new computational scheme is demonstrated using five sets of mutational scanning data, denoted “targets”: (a) protease specificity of APPI (amyloid precursor protein inhibitor) variants; (b-d) three stability related properties of IGBPG (immunoglobulin G-binding β1 domain of streptococcal protein G) variants; and (e) fluorescence of GFP (green fluorescent protein) variants. Performance is measured by the overall correlation of the predicted and observed properties, and enrichment—the ability to predict the most potent variants and presumably guide further experiments. Despite the diversity of the targets the statistical models can generalize variant examples thereof and predict the properties of test variants with both single and multiple mutations.

中文翻译:

通过结合深度突变扫描和机器学习来预测突变结果

深度突变扫描提供了前所未有的丰富的关于蛋白质突变功能结果的定量数据。单个实验可以测量许多蛋白质变体的特性(例如,结构稳定性)。利用实验数据来深入了解突变景观中未探索的区域是一项重大的计算挑战。这些见解可能有助于进一步的实验工作,并加速具有有益治疗或工业相关特性的新型蛋白质变体的开发。在这里,我们提出了一种新颖的机器学习方法,用于在深度突变筛选的背景下预测功能突变结果。使用具有已知特性的变体的序列(one-hot)特征,以及从其模型派生的结构特征,我们训练预测统计模型来估计其他变体的未知属性。使用五组突变扫描数据证明了新计算方案的实用性,表示为“目标”:(a)APPI(淀粉样前体蛋白抑制剂)变体的蛋白酶特异性;(bd) IGBPG(链球菌蛋白 G 的免疫球蛋白 G 结合 β1 结构域)变体的三个稳定性相关特性;(e) GFP(绿色荧光蛋白)变体的荧光。性能是通过预测和观察到的属性的整体相关性和富集来衡量的——预测最有效变体并可能指导进一步实验的能力。
更新日期:2021-07-22
down
wechat
bug