当前位置: X-MOL 学术Microb. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of prokaryotic transposases from protein features with machine learning approaches
Microbial Genomics ( IF 4.0 ) Pub Date : 2021-07-26 , DOI: 10.1099/mgen.0.000611
Qian Wang 1 , Jun Ye 2 , Teng Xu 3 , Ning Zhou 4 , Zhongqiu Lu 4 , Jianchao Ying 4, 5
Affiliation  

Identification of prokaryotic transposases (Tnps) not only gives insight into the spread of antibiotic resistance and virulence but the process of DNA movement. This study aimed to develop a classifier for predicting Tnps in bacteria and archaea using machine learning (ML) approaches. We extracted a total of 2751 protein features from the training dataset including 14852 Tnps and 14852 controls, and selected 75 features as predictive signatures using the combined mutual information and least absolute shrinkage and selection operator algorithms. By aggregating these signatures, an ensemble classifier that integrated a collection of individual ML-based classifiers, was developed to identify Tnps. Further validation revealed that this classifier achieved good performance with an average AUC of 0.955, and met or exceeded other common methods. Based on this ensemble classifier, a stand-alone command-line tool designated TnpDiscovery was established to maximize the convenience for bioinformaticians and experimental researchers toward Tnp prediction. This study demonstrates the effectiveness of ML approaches in identifying Tnps, facilitating the discovery of novel Tnps in the future.

中文翻译:


使用机器学习方法根据蛋白质特征预测原核转座酶



原核转座酶 (Tnps) 的鉴定不仅可以深入了解抗生素耐药性和毒力的传播,还可以深入了解 DNA 运动的过程。本研究旨在开发一种使用机器学习 (ML) 方法预测细菌和古细菌中 Tnps 的分类器。我们从训练数据集中提取了总共 2751 个蛋白质特征,包括 14852 个 Tnps 和 14852 个对照,并使用组合的互信息和最小绝对收缩和选择算子算法选择了 75 个特征作为预测签名。通过聚合这些签名,开发了一个集成分类器来识别 Tnps,该集成分类器集成了基于 ML 的各个分类器的集合。进一步验证表明,该分类器取得了良好的性能,平均 AUC 为 0.955,达到或超过了其他常见方法。基于该集成分类器,建立了一个名为 TnpDiscovery 的独立命令行工具,以最大程度地方便生物信息学家和实验研究人员进行 Tnp 预测。这项研究证明了机器学习方法在识别 Tnps 方面的有效性,有助于未来发现新型 Tnps。
更新日期:2021-07-27
down
wechat
bug