当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-07-08 , DOI: 10.1109/tcbb.2021.3095482
Qiao Ning 1 , Xiaowei Zhao 2 , Zhiqiang Ma 2
Affiliation  

Glutarylation is a type of post-translational modification that occurs on lysine residues. It plays an irreplaceable role in various cellular functions. Therefore, identification of glutarylation sites is significant for understanding the molecular mechanism of glutarylation. In this study, we proposed a method named DEXGB_Glu to identify lysine glutarylation sites using XGBoost as classifier which was optimized by differential evolution algorithm. Aiming at the imbalance between positive samples and negative samples, Borderline-SMOTE method was employed to synthesize positive samples, increasing their amount equal to negative samples. Then, Tomek links technique was applied to filter out noise data. Analysis of this method and its results showed that differential evolution algorithm obviously improved the performance and the combination of Borderline-SMOTE and Tomek links effectively solved the imbalance between positive samples and negative samples. Finally, the performance of this method was much better than other methods in prediction of glutarylation sites. The data and code are available on https://github.com/ningq669/DEXGB_Glu.

中文翻译:


结合Borderline-SMOTE与Tomek Links技术在不平衡数据中识别戊二酸化位点的新方法



戊二酰化是一种发生在赖氨酸残基上的翻译后修饰。它在细胞的多种功能中发挥着不可替代的作用。因此,戊二酰化位点的鉴定对于理解戊二酰化的分子机制具有重要意义。在本研究中,我们提出了一种名为 DEXGB_Glu 的方法来识别赖氨酸戊二酰化位点,使用 XGBoost 作为分类器,并通过差分进化算法进行优化。针对正样本与负样本不平衡的问题,采用Borderline-SMOTE方法合成正样本,增加其与负样本的数量相等。然后,应用 Tomek links 技术过滤掉噪声数据。对该方法及其结果进行分析表明,差分进化算法明显提高了性能,Borderline-SMOTE和Tomek链路的结合有效解决了正样本和负样本之间的不平衡问题。最后,该方法在预测戊二酸化位点方面的性能远优于其他方法。数据和代码可在 https://github.com/ningq669/DEXGB_Glu 上获取。
更新日期:2021-07-08
down
wechat
bug