当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpreting a black box predictor to gain insights into early folding mechanisms
Computational and Structural Biotechnology Journal ( IF 4.4 ) Pub Date : 2021-08-27 , DOI: 10.1016/j.csbj.2021.08.041
Isel Grau 1 , Ann Nowé 1, 2 , Wim Vranken 1, 2, 3, 4
Affiliation  

Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via , as a resource to help understand and steer early protein folding.

中文翻译:


解释黑盒预测器以深入了解早期折叠机制



蛋白质折叠和功能密切相关,但蛋白质折叠的确切机制仍然难以捉摸。早期折叠残基 (EFR) 是特定蛋白质内的氨基酸,可诱导折叠过程的第一阶段。高分辨率 EFR 数据仅适用于少数蛋白质,这使得基于蛋白质序列的机器学习“黑盒”预测器 (EFoldMine) 的训练成为可能。这种黑盒方法不允许直接提取嵌入蛋白质序列中的“早期折叠规则”,而这种解释对于提高我们对折叠过程如何工作的理解至关重要。我们在这里应用并研究了一种新颖的“灰盒”方法来根据蛋白质序列预测 EFR,以获得蛋白质中 EFR 序列决定因素的机制残基水平见解。我们解释三个数据集的规则集,一个由天然蛋白质组成的默认集,一个由加扰默认集序列组成的加扰集,以及一组设计的蛋白质。最后,我们将这些数据与折叠蛋白质中采用的二级结构联系起来,并通过 在线提供所有信息,作为帮助理解和引导早期蛋白质折叠的资源。
更新日期:2021-08-27
down
wechat
bug