当前位置: X-MOL 学术Comput. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels
Computers in Biology and Medicine ( IF 7.0 ) Pub Date : 2021-01-07 , DOI: 10.1016/j.compbiomed.2021.104212
Trinh-Trung-Duong Nguyen , Nguyen-Quoc-Khanh Le , The-Anh Tran , Dinh-Minh Pham , Yu-Yen Ou

Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins. From segments of ion channel proteins centered around N-linked glycosylation sites, the amino acid embedding vectors of each residue were concatenated to create features for prediction. We experimented with two different models for converting amino acids to their corresponding embeddings: one was fed with ion channel sequences and the other with a large dataset composed of more than one million protein sequences. The latter model stemmed from the idea of transfer learning technique and emerged as a more efficient feature extractor. Our best model was obtained from this transfer learning approach and a hyperparameter tuning process with a random search on 5-fold cross-validation data. It achieved an accuracy, specificity, sensitivity, and Matthews correlation coefficient of 93.4%, 92.8%, 98.6%, and 0.726, respectively. Corresponding scores on an independent test were 92.9%, 92.2%, 99%, and 0.717. These results outperform the position-specific scoring matrix features that are predominantly employed in post-translational modification site predictions. Furthermore, compared to N-GlyDE, GlycoEP, SPRINT-Gly, the most recent N-linked glycosylation site predictors, our model yields higher scores on the above 4 metrics, thus further demonstrating the efficiency of our approach.



中文翻译:

将转移学习技术与氨基酸嵌入相结合,可有效预测离子通道中的N-联糖基化位点

糖基化是一种动态的酶促过程,将聚糖附着到蛋白质或其他有机分子(例如脂蛋白)上。研究表明,离子通道蛋白中的这种过程在调节离子通道功能中起着基本作用。这项研究使用一种计算方法来预测离子通道蛋白中最常见的N型糖基化位点。从以N-连接的糖基化位点为中心的离子通道蛋白片段中,将每个残基的氨基酸嵌入载体连接起来,以创建预测特征。我们用两种不同的模型进行了实验,将氨基酸转化为相应的嵌入物:一个模型被添加了离子通道序列,另一个模型被包含了超过一百万个蛋白质序列的大型数据集。后一种模型源于迁移学习技术的思想,并成为一种更有效的特征提取器。我们的最佳模型是从这种转移学习方法和超参数调整过程中获得的,该过程对5倍交叉验证数据进行了随机搜索。它的准确度,特异性,敏感性和Matthews相关系数分别为93.4%,92.8%,98.6%和0.726。独立测试的相应分数分别为92.9%,92.2%,99%和0.717。这些结果优于翻译后修饰位点预测中主要采用的位置特定评分矩阵功能。此外,与N-GlyDE,GlycoEP,SPRINT-Gly(最新的N联糖基化位点预测指标)相比,我们的模型在上述4个指标上得分更高,

更新日期:2021-01-14
down
wechat
bug