当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2020-06-29 , DOI: 10.1093/bib/bbaa125
Ruopeng Xie 1 , Jiahui Li 1 , Jiawei Wang 2 , Wei Dai 3 , André Leier 4 , Tatiana T Marquez-Lago 4 , Tatsuya Akutsu 5 , Trevor Lithgow 6 , Jiangning Song 7 , Yanju Zhang 8
Affiliation  

Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user’s viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.

中文翻译:

DeepVF:一种基于深度学习的混合框架,用于使用堆叠策略识别毒力因子。

毒力因子 (VF) 使病原体能够感染宿主。大量以疾病为重点的个体研究已经确定了各种各样的 VF,越来越多的细菌基因组序列数据为旨在预测 VF 的计算方法提供了机会。尽管它们具有吸引人的优点和性能改进,但现有方法有一些局限性和缺点。首先,随着抗生素耐药性的出现,VFs 的特征和机制不断演变,使用先前基于过时数据集开发的现有工具来识别新的 VFs 越来越困难;其次,很少有系统的特征工程努力来检查不同类型特征对模型性能的效用,因为大多数工具只专注于提取很少类型的特征。通过解决上述问题,VF 预测器的准确性可能会显着提高。反过来,这在 VF 的全基因组预测中特别有用。在这项工作中,我们提出了一种基于深度学习 (DL) 的混合框架(称为 DeepVF),该框架利用堆叠策略来实现更准确的 VF 识别。DeepVF 使用扩大的最新数据集,通过流行的机器学习算法全面探索各种异构特征。具体来说,四种经典算法,包括随机森林,支持向量机,极端梯度提升和多层感知器,以及三种深度学习算法,包括卷积神经网络,长短期记忆网络和深度神经网络被用来训练使用这些特征的 62 个基线模型。为了整合各自的长处,DeepVF 有效地结合了这些基线模型,使用堆叠策略构建了最终的元模型。广泛的基准测试实验证明了 DeepVF 的有效性:与基准数据集上的基线模型相比,它实现了更准确和稳定的性能,并且在独立测试中明显优于最先进的 VF 预测器。使用所提出的混合集成模型,实现了一个用户友好的 DeepVF 在线预测器(http://deepvf.erc.monash.edu/)。此外,从用户的角度来看,它的效用与现有工具包的效用进行了比较。
更新日期:2020-06-30
down
wechat
bug