当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration
BMC Genomics ( IF 3.5 ) Pub Date : 2020-11-23 , DOI: 10.1186/s12864-020-07166-w
Xin Liu 1 , Liang Wang 1, 2 , Jian Li 3 , Junfeng Hu 1 , Xiao Zhang 1
Affiliation  

Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec , together with the data sets used in this study.

中文翻译:


Mal-Prec:通过基于机器学习的特征集成对蛋白质丙二酰化位点进行计算预测



丙二酰化是最近发现的一种翻译后修饰,与多种疾病有关,例如 2 型糖尿病和不同类型的癌症。与丙二酰化位点的实验鉴定相比,计算方法是一个省时且成本相对较低的过程。在这项研究中,我们提出了一种称为 Mal-Prec(丙二酰化预测)的新型计算模型,通过主成分分析和支持向量机的结合来预测丙二酰化位点。最初进行 One-hot 编码、理化性质和 k 间隔酸对的组成来提取序列特征。然后应用PCA来选择最佳特征子集,同时采用SVM来预测丙二酰化位点。五折交叉验证结果表明,与其他方法相比,Mal-Prec 可以实现更好的预测性能。在独立数据集的 5 倍交叉验证中,AUC(受试者工作特征曲线下面积)分析分别达到 96.47 和 90.72%。 Mal-Prec 是一种计算可靠的方法,用于识别蛋白质序列中的丙二酰化位点。它优于现有的预测工具,可以作为识别和发现人类蛋白质中新型丙二酰化位点的有用工具。 Mal-Prec 在 MATLAB 中进行编码,并与本研究中使用的数据集一起在 https://github.com/flyinsky6/Mal-Prec 上公开提供。
更新日期:2020-11-23
down
wechat
bug