当前位置: X-MOL 学术BMC Mol. Cell Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data.
BMC Molecular and Cell Biology ( IF 2.4 ) Pub Date : 2019-06-28 , DOI: 10.1186/s12860-019-0200-9
Rajaram Gana 1 , Sona Vasudevan 1
Affiliation  

BACKGROUND To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find. RESULTS Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: ~(W-S/T-W), where "~" denotes the "not" operator. 2) The consensus sequon for phosphorylation is ~(W-S/T/Y/H-W); although W-S/T/Y/H-W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N - ~P - S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N - ~P - S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%. CONCLUSIONS The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation.

中文翻译:


岭回归利用结构和序列数据估计蛋白质中 O-糖基化的线性概率模型预测。



背景技术迄今为止,还没有关于找到O-糖基化的共有序列序列的主张。因此,使用经典回归分析通过序列和结构信息预测O-糖基化的可能性是相当困难的。特别地,如果使用二元响应来区分O-糖基化和非O-糖基化序列,则很难找到一组合适的非O-糖基化序列。结果分析了发生在 S/T 位点或非常接近 S/T 位点的相似蛋白质翻译后修饰 (PTM) 的三个序列:N-糖基化、O-粘蛋白型 (O-GalNAc) 糖基化和磷酸化。发现的结果包括: 1) O-糖基化的共有复合序列是:~(WS/TW),其中“~”表示“not”操作符。 2) 磷酸化的共有序列是~(WS/T/Y/HW);尽管 WS/T/Y/HW 并不是磷酸化的绝对抑制剂。 3)对于线性概率模型(LPM)估计,N-糖基化序列是非O-糖基化序列的良好近似;虽然N-~P-S/T并不是O-糖基化的绝对抑制剂。 4) 氨基酸沿序列的选择性定位可区分蛋白质的 PTM。 5) 一些N-糖基化序列也在N - ~P - S/T 序列片段中的S/T 位点处被磷酸化。 6) N-糖基化序列的 ASA 值随机大于 O-GlcNAc 糖基化序列的 ASA 值。 7) 结构属性(β 转 II、II´、螺旋、β 桥、β 发夹和 phi 角)是 O-GlcNAc 糖基化的重要 LPM 预测因子。以序列和结构数据作为解释变量的 LPM 产生 99% 的 Kolmogorov-Smirnov (KS) 统计量。 8) 仅使用序列数据,KS 统计量下降至 80%,并且样本外 O-GlcNAc 糖基化序列的 21% 被错误预测为未糖基化。该错误预测率的 95% 置信区间为 16% 到 26%。结论 数据表明存在 O-糖基化的共有序列序列;并强调了结构信息对于预测 O-糖基化可能性的相关性。
更新日期:2019-06-28
down
wechat
bug