当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors.
Bioinformatics ( IF 5.8 ) Pub Date : 2020-04-06 , DOI: 10.1093/bioinformatics/btaa230
Dandan Zheng 1 , Guansong Pang 2 , Bo Liu 1 , Lihong Chen 1 , Jian Yang 1
Affiliation  

Motivation
Identification of virulence factors (VFs) is critical to the elucidation of bacterial pathogenesis and prevention of related infectious diseases. Current computational methods for VF prediction focus on binary classification or involve only several class(es) of VFs with sufficient samples. However, thousands of VF classes are present in real-world scenarios, and many of them only have a very limited number of samples available.
Results
We first construct a large VF dataset, covering 3,446 VF classes with 160,495 sequences, and then propose deep convolutional neural network (CNN) models for VF classification. We show that (i) for common VF classes with sufficient samples, our models can achieve state-of-the-art performance with an overall accuracy of 0.9831 and an F1-score of 0.9803; (ii) for uncommon VF classes with limited samples, our models can learn transferable features from auxiliary data and achieve good performance with accuracy ranging from 0.9277 to 0.9512 and F1-score ranging from 0.9168 to 0.9446 when combined with different predefined features, outperforming traditional classifiers by 1%-13% in accuracy and by 1%-16% in F1-score.
Availability
All of our datasets are made publicly available at http://www.mgc.ac.cn/VFNet/, and the source code of our models is publicly available at https://github.com/zhengdd0422/VFNet
Supplementary information
Supplementary data are available at Bioinformatics online.


中文翻译:

学习可转移的深度卷积神经网络,对细菌毒力因子进行分类。

动机
毒力因子(VFs)的鉴定对于阐明细菌发病机理和预防相关感染性疾病至关重要。当前用于VF预测的计算方法侧重于二进制分类或仅涉及具有足够样本的VF的几类。但是,现实场景中存在成千上万个VF类,并且其中许多仅具有非常有限数量的可用样本。
结果
我们首先构建一个大型的VF数据集,涵盖3,446个具有160,495个序列的VF类,然后提出用于VF分类的深度卷积神经网络(CNN)模型。我们证明(i)对于具有足够样本的常见VF类,我们的模型可以实现最新的性能,整体精度为0.9831,F1得分为0.9803;(ii)对于样本数量有限的罕见VF类,我们的模型可以与辅助数据结合使用,并获得良好的性能,与不同的预定义功能结合使用时,其精度在0.9277至0.9512之间,F1分数在0.9168至0.9446之间,优于传统分类器准确度提高1%-13%,F1得分提高1%-16%。
可用性
我们所有的数据集均可在http://www.mgc.ac.cn/VFNet/上公开获得,我们模型的源代码可在https://github.com/zhengdd0422/VFNet上公开获得。
补充资料
补充数据可从生物信息学在线获得。
更新日期:2020-04-06
down
wechat
bug