当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC
Current Bioinformatics ( IF 2.4 ) Pub Date : 2020-05-31 , DOI: 10.2174/1574893614666190723114923
Saba Amanat 1 , Adeel Ashraf 1 , Waqar Hussain 1 , Nouman Rasool 2 , Yaser Daanial Khan 1
Affiliation  

Background: Carboxylation is one of the most biologically important post-translational modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent and biologically important type of carboxylation. For studying such biological functions, it is essential to correctly determine the lysine sites sensitive to carboxylation.

Objective: Herein, we present a computational model for the prediction of the carboxylysine site which is based on machine learning.

Methods: Various position and composition relative features have been incorporated into the Pse- AAC for construction of feature vectors and a neural network is employed as a classifier. The model is validated by jackknife, cross-validation, self-consistency, and independent testing.

Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp, 99.76% Sp, and 0.99 MCC.Using the jackknife method, prediction model validation gave 97.07% Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.

Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed model has better performance as compared to the existing model PreLysCar; however, the accuracy can be improved further, in the future, due to the increasing number of carboxylysine sites in proteins.



中文翻译:

通过一般PseAAC整合统计矩和位置相对特征来鉴定蛋白质中的赖氨酸羧化位点

背景:羧化是生物学上最重要的翻译后修饰之一,发生在蛋白质的赖氨酸,精氨酸和谷氨酰胺残基上。在这三者中,羧基与赖氨酸侧链的共价连接是最常见且生物学上重要的羧化类型。为了研究这种生物学功能,必须正确确定对羧化敏感的赖氨酸位点。

目的:本文中,我们基于机器学习提出了一种用于预测羧基赖氨酸位点的计算模型。

方法:将各种位置和构图的相对特征合并到Pse-AAC中,以构建特征向量,并使用神经网络作为分类器。该模型通过折刀,交叉验证,自洽和独立测试进行了验证。

结果:自一致性测试的结果详细说明了模型具有99.76%的Acc,99.76%的Sp,99.76%的Sp和0.99 MCC。使用折刀法,预测模型验证的结果为97.07%的Acc,而对于10倍交叉-验证,预测模型验证给出了95.16%的Acc。

结论:独立数据集测试的结果为94.3%,表明与现有模型PreLysCar相比,该模型具有更好的性能;但是,由于蛋白质中羧基赖氨酸位点的数量不断增加,将来可以进一步提高准确性。

更新日期:2020-05-31
down
wechat
bug