当前位置: X-MOL 学术Math. Biosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of potential biomarkers on microarray data using distributed gene selection approach.
Mathematical Biosciences ( IF 1.9 ) Pub Date : 2019-07-18 , DOI: 10.1016/j.mbs.2019.108230
Alok Kumar Shukla 1 , Diwakar Tripathi 2
Affiliation  

In recent times, several feature selection (FS) methods have introduced to identify the biomarkers from gene expression datasets. It has gained extensive attention to solve cancer classification problem, but they have some limitations. First, the majority of FS approaches increases the computational cost due to the centralized data structure. Second, an irrelevant ranked gene that could perform well regarding classification accuracy with suitable subset of genes will be left out of the selection. To resolve these problems, we introduce a novel two-stage FS approach by combining Spearman's Correlation (SC) and distributed filter FS methods which can select the highly discriminative genes for distinguishing samples from high dimensional datasets. Concerning distributed FS, data is distributed by features according to vertical distribution and then performs a merging procedure which updates the feature subset along with improved classification accuracy. Moreover, it is used to quantify the relation between gene-gene and the gene-class and simultaneously detect subsets of essential genes. The proposed method is verified on six gene datasets with the help of four well-known classifiers namely, support vector machine, naïve Bayes, k-nearest neighbor, and decision tree. The performance of the proposed method is compared with traditional filter techniques such as Relief-F, Information gain, minimum redundancy maximum relevance, joint mutual information, Chi-square, and t-test. The experimental results demonstrate that the proposed method has significantly improved the performance regarding computational time and classification accuracy in comparison to standard algorithms when applied to the non-partitioned dataset.

中文翻译:

使用分布式基因选择方法鉴定微阵列数据上潜在的生物标记。

近年来,引入了几种特征选择(FS)方法来从基因表达数据集中识别生物标志物。解决癌症分类问题已引起广泛关注,但是它们有一些局限性。首先,由于集中式数据结构,大多数FS方法增加了计算成本。其次,在分类正确性和适当基因子集方面表现良好的无关基因将被排除在选择之外。为了解决这些问题,我们通过结合Spearman的相关性(SC)和分布式过滤器FS方法引入一种新颖的两阶段FS方法,该方法可以选择高度区分性的基因来区分高维数据集的样本。关于分布式FS,数据由特征根据垂直分布进行分布,然后执行合并过程,以更新特征子集并提高分类精度。此外,它用于量化基因与基因之间的关系,并同时检测必需基因的子集。借助于四个著名的分类器,即支持向量机,朴素贝叶斯,k最近邻和决策树,在六个基因数据集上验证了该方法。将该方法的性能与传统滤波器技术(如Relief-F,信息增益,最小冗余最大相关性,联合互信息,卡方和t检验)进行了比较。
更新日期:2019-11-01
down
wechat
bug