当前位置: X-MOL 学术Curr. Proteom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of Nitration Sites Based on FCBF Method and Stacking Ensemble Model
Current Proteomics ( IF 0.8 ) Pub Date : 2021-09-30 , DOI: 10.2174/1570164618999210101222637
Min Liu 1 , Lu Zhang 1 , Xinyi Qin 1 , Tao Huang 2 , Ziwei Xu 3 , Guangzhong Liu 1
Affiliation  

Background: Nitration is an important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of - NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer.

Objective: It is necessary and important to identify the nitration sites in protein sequences. Predicting which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases.

Methods: In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation.

Results and Conclusion: Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was built, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.



中文翻译:

基于FCBF法和叠加系综模型的硝化位点预测

背景:硝化是一种重要的翻译后修饰 (PTM),发生在蛋白质的酪氨酸残基上。在疾病条件下蛋白质酪氨酸硝化的发生是不可避免的,并且代表了从 - NO 的信号转导生理作用到氧化和潜在致病途径的转变。异常的蛋白质硝化修饰会导致严重的人类疾病,包括神经退行性疾病、急性呼吸窘迫、器官移植排斥和肺癌。

目的:鉴定蛋白质序列中的硝化位点是必要和重要的。预测蛋白质序列中哪些酪氨酸残基被硝化,哪些不被硝化,对于硝化机制及相关疾病的研究具有重要意义。

方法:在本研究中,通过叠加集成学习和融合多个特征,提出了一种基于过欠采样策略和FCBF方法的硝化位点预测模型。首先,蛋白质序列样本由2701维融合特征(PseAAC、PSSM、AAIndex、CKSAAP、Disorder)编码。其次,排序特征集是根据对称不确定性度量通过 FCBF 方法生成的。第三,在模型训练过程中,使用过采样和欠采样技术来处理不平衡的数据集。最后,采用增量特征选择(IFS)方法提取基于10折交叉验证的最优分类器。

结果与结论:结果表明该模型在MCC、Recall、F1-score等指标上具有显着的性能优势,无论是在独立测试集上与其他分类器进行何种比较,还是与单一类型特征或训练集上的融合特征。通过集成FCBF特征排序方法、过采样和欠采样技术以及由多个基分类器组成的堆叠模型,建立了硝化PTM位点的有效预测模型,当正负比值达到时,可以获得更好的召回率。样本高度不平衡。

更新日期:2021-11-23
down
wechat
bug