当前位置: X-MOL 学术Comput. Math. Method Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection
Computational and Mathematical Methods in Medicine Pub Date : 2021-01-06 , DOI: 10.1155/2021/6690299
Chenchen Ding 1 , Haitao Han 1 , Qianyue Li 1 , Xiaoxia Yang 1 , Taigang Liu 1
Affiliation  

Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.

中文翻译:

iT3SE-PX:使用 PSSM 配置文件和 XGBoost 特征选择鉴定 III 型细菌分泌效应器

由于其在理解宿主 - 病原体相互作用和开发针对病原体的更好治疗靶标方面的关键作用,鉴定细菌 III 型分泌效应物 (T3SE) 已成为生物信息学领域的热门研究课题。然而,使用传统的实验方法识别所有效应蛋白通常既费时又费力。因此,开发计算方法以准确预测推定的新型效应物对于减少用于验证的生物实验数量非常重要。在这项研究中,我们提出了一种称为 iT3SE-PX 的方法,用于仅基于蛋白质序列来识别 T3SE。首先,从特定位置评分矩阵 (PSSM) 配置文件中提取了三种特征,以帮助训练机器学习 (ML) 模型。然后,执行极端梯度提升 (XGBoost) 算法以根据它们的分类能力对这些特征进行排名。最后,选择最佳特征作为支持向量机 (SVM) 分类器的输入,以预测 T3SE。基于两个基准数据集,我们分别进行了 100 次随机 5 折交叉验证 (CV) 和独立测试。实验结果表明,与大多数现有方法相比,所提出的方法取得了优异的性能,并且可以作为识别推定 T3SE 的有用工具,仅提供序列信息。基于两个基准数据集,我们分别进行了 100 次随机 5 折交叉验证 (CV) 和独立测试。实验结果表明,与大多数现有方法相比,所提出的方法取得了优异的性能,并且可以作为识别推定 T3SE 的有用工具,仅提供序列信息。基于两个基准数据集,我们分别进行了 100 次随机 5 折交叉验证 (CV) 和独立测试。实验结果表明,与大多数现有方法相比,所提出的方法取得了优异的性能,并且可以作为识别推定 T3SE 的有用工具,仅提供序列信息。
更新日期:2021-01-06
down
wechat
bug