当前位置: X-MOL 学术Protein Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning peptide recognition rules for a low‐specificity protein
Protein Science ( IF 4.5 ) Pub Date : 2020-09-26 , DOI: 10.1002/pro.3958
Lucas C Wheeler 1, 2, 3 , Arden Perkins 1, 2 , Caitlyn E Wong 1, 2 , Michael J Harms 1, 2
Affiliation  

Many proteins interact with short linear regions of target proteins. For some proteins, however, it is difficult to identify a well‐defined sequence motif that defines its target peptides. To overcome this difficulty, we used supervised machine learning to train a model that treats each peptide as a collection of easily‐calculated biochemical features rather than as an amino acid sequence. As a test case, we dissected the peptide‐recognition rules for human S100A5 (hA5), a low‐specificity calcium binding protein. We trained a Random Forest model against a recently released, high‐throughput phage display dataset collected for hA5. The model identifies hydrophobicity and shape complementarity, rather than polar contacts, as the primary determinants of peptide binding specificity in hA5. We tested this hypothesis by solving a crystal structure of hA5 and through computational docking studies of diverse peptides onto hA5. These structural studies revealed that peptides exhibit multiple binding modes at the hA5 peptide interface—all of which have few polar contacts with hA5. Finally, we used our trained model to predict new, plausible binding targets in the human proteome. This revealed a fragment of the protein α‐1‐syntrophin that binds to hA5. Our work helps better understand the biochemistry and biology of hA5, as well as demonstrating how high‐throughput experiments coupled with machine learning of biochemical features can reveal the determinants of binding specificity in low‐specificity proteins.

中文翻译:


学习低特异性蛋白质的肽识别规则



许多蛋白质与目标蛋白质的短线性区域相互作用。然而,对于某些蛋白质来说,很难确定定义其目标肽的明确序列基序。为了克服这个困难,我们使用监督机器学习来训练一个模型,将每个肽视为易于计算的生化特征的集合,而不是氨基酸序列。作为一个测试案例,我们剖析了人类 S100A5 (hA5)(一种低特异性钙结合蛋白)的肽识别规则。我们针对最近发布的为 hA5 收集的高通量噬菌体展示数据集训练了随机森林模型。该模型将疏水性和形状互补性(而不是极性接触)确定为 hA5 肽结合特异性的主要决定因素。我们通过解析 hA5 的晶体结构以及通过不同肽与 hA5 的计算对接研究来测试这一假设。这些结构研究表明,肽在 hA5 肽界面上表现出多种结合模式,所有这些模式都与 hA5 几乎没有极性接触。最后,我们使用经过训练的模型来预测人类蛋白质组中新的、合理的结合目标。这揭示了与 hA5 结合的α -1-肌营养蛋白蛋白片段。我们的工作有助于更好地了解 hA5 的生物化学和生物学,并展示高通量实验与生化特征的机器学习相结合如何揭示低特异性蛋白质中结合特异性的决定因素。
更新日期:2020-10-30
down
wechat
bug