当前位置: X-MOL 学术Med. Biol. Eng. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping
Medical & Biological Engineering & Computing ( IF 3.2 ) Pub Date : 2021-02-17 , DOI: 10.1007/s11517-021-02324-y
Joyshri Das 1 , Soma Barman Mandal 1
Affiliation  

Classification of Homo sapiens gene behavior employing computational biology is a recent research trend. But monitoring gene activity profile and genetic behavior from the alphabetic DNA sequence using a non-invasive method is a tremendous challenge in functional genomics. The present paper addresses such issue and attempts to differentiate Homo sapiens genes using linear discriminant analysis (LDA) method. Annotated protein coding sequences of Homo sapiens genes, collected from NCBI, are taken as test samples. Minimum entropy-based mapping (MEM) technique assists to extract highest information from the numerical DNA sequences. The proposed LDA technique has successfully classified Homo sapiens genes based on the following features: composition of hydrophilic amino acids, dominance of arginine amino acid, and magnitude and size of individual amino acids. The proposed algorithm is successfully tested on 84 Homo sapiens healthy and cancer genes of the prostate and breast cells. Classification performance of the proposed LDA technique is judged by sensitivity (89.12%), specificity (91.9%), accuracy (90.87%), F1 score (92.03%), Matthews’ correlation coefficients (81.04%), and miss rate (9.12%), and it outperforms other four existing classifiers. The results are cross-validated through Rayleigh PDF and mutual information technique. Fisher test, 2-sample T-test, and relative entropy test are considered to verify the efficacy of the present classifier.



中文翻译:

使用与最小熵映射融合的线性判别分析对智人基因行为进行分类

利用计算生物学对智人基因行为进行分类是最近的研究趋势。但使用非侵入性方法从字母 DNA 序列监测基因活性谱和遗传行为是功能基因组学中的一项巨大挑战。本论文解决了这个问题,并尝试使用线性判别分析 (LDA) 方法区分智人基因。从NCBI收集的智人基因的注释蛋白编码序列作为测试样本。基于最小熵的映射 (MEM) 技术有助于从数字 DNA 序列中提取最高信息。提出的 LDA 技术已根据以下特征成功地对智人基因进行了分类:亲水性氨基酸的组成、精氨酸氨基酸的优势、以及单个氨基酸的数量和大小。所提出的算法在前列腺和乳腺细胞的 84 个智人健康和癌症基因上成功进行了测试。所提出的 LDA 技术的分类性能由灵敏度 (89.12%)、特异性 (91.9%)、准确度 (90.87%)、F1 分数 (92.03%)、马修斯相关系数 (81.04%) 和遗漏率 (9.12%) 来判断),并且它优于其他四个现有的分类器。结果通过 Rayleigh PDF 和互信息技术进行了交叉验证。Fisher 检验,2 个样本 Matthews 的相关系数 (81.04%) 和未命中率 (9.12%),优于其他四个现有分类器。结果通过 Rayleigh PDF 和互信息技术进行了交叉验证。Fisher 检验,2 个样本 Matthews 的相关系数 (81.04%) 和未命中率 (9.12%),优于其他四个现有分类器。结果通过 Rayleigh PDF 和互信息技术进行了交叉验证。Fisher 检验,2 个样本T检验和相对熵检验被认为是验证本分类器的功效。

更新日期:2021-02-17
down
wechat
bug