当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparisons of classification methods for viral genomes and protein families using alignment-free vectorization
Statistical Applications in Genetics and Molecular Biology ( IF 0.9 ) Pub Date : 2018-06-30 , DOI: 10.1515/sagmb-2018-0004
Hsin-Hsiung Huang 1 , Shuai Hao 2 , Saul Alarcon 2 , Jie Yang 2
Affiliation  

In this paper, we propose a statistical classification method based on discriminant analysis using the first and second moments of positions of each nucleotide of the genome sequences as features, and compare its performances with other classification methods as well as natural vector for comparative genomic analysis. We examine the normality of the proposed features. The statistical classification models used including linear discriminant analysis, quadratic discriminant analysis, diagonal linear discriminant analysis, k-nearest-neighbor classifier, logistic regression, support vector machines, and classification trees. All these classifiers are tested on a viral genome dataset and a protein dataset for predicting viral Baltimore labels, viral family labels, and protein family labels.

中文翻译:

使用无比对矢量化的病毒基因组和蛋白质家族分类方法的比较

在本文中,我们提出了一种基于判别分析的统计分类方法,以基因组序列的每个核苷酸位置的第一和第二矩为特征,并将其性能与其他分类方法以及用于比较基因组分析的自然向量进行比较。我们检查所提出的特征的正态性。使用的统计分类模型包括线性判别分析、二次判别分析、对角线性判别分析、ķ-最近邻分类器、逻辑回归、支持向量机和分类树。所有这些分类器都在病毒基因组数据集和蛋白质数据集上进行测试,以预测病毒巴尔的摩标签、病毒家族标签和蛋白质家族标签。
更新日期:2018-06-30
down
wechat
bug