当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gene Selection in Multi-class Imbalanced Microarray Datasets Using Dynamic Length Particle Swarm Optimization
Current Bioinformatics ( IF 2.4 ) Pub Date : 2021-05-31 , DOI: 10.2174/1574893615999201002093834
R. Devi Priya 1 , R. Sivaraj 2
Affiliation  

Background: Microarray gene expression datasets usually contain a large number of genes that complicate further operations like classification, clustering and other kinds of analysis. During the classification process, the identification of salient genes is a brainstorming task and needs a careful selection.

Methods: The classification of multi-class datasets is more critical when compared with binary classification. When there are multiple class labels, chances are more likely that the datasets are imbalanced. Large variations can be seen in the number of samples belonging to each class, and hence the classification process may go biased with incorrect samples chosen for training. There is no sufficient research work available to address all these three scenarios together in microarray datasets.

Results and Discussion: The paper fills this gap with the following contributions: i) Selects salient genes for classification using multiSURF algorithm ii) Identifies right instances from imbalanced datasets using Retained Tomek Link algorithm and iii) Performs gene selection for multi-class classification using Dynamic Length Particle Swarm Optimization (DPSO).

Conclusion: The proposed method is implemented on multi-class imbalanced microarray datasets, and the final classification performance is seen to be encouraging and better than other compared methods.



中文翻译:

使用动态长度粒子群优化的多类不平衡微阵列数据集中的基因选择

背景:微阵列基因表达数据集通常包含大量基因,这些基因使分类、聚类和其他类型的分析等进一步操作变得复杂。在分类过程中,识别显着基因是一项集思广益的任务,需要仔细选择。

方法:与二元分类相比,多类数据集的分类更为关键。当有多个类标签时,数据集很可能不平衡。在属于每个类别的样本数量中可以看到很大的变化,因此分类过程可能会因选择不正确的训练样本而产生偏差。没有足够的研究工作可以在微阵列数据集中解决所有这三种情况。

结果和讨论:本文通过以下贡献填补了这一空白:i) 使用 multiSURF 算法选择显着基因进行分类 ii) 使用 Retained Tomek Link 算法从不平衡数据集中识别正确的实例 iii) 使用动态执行多类分类的基因选择长度粒子群优化 (DPSO)。

结论:所提出的方法在多类不平衡微阵列数据集上实施,最终分类性能令人鼓舞,并且优于其他比较方法。

更新日期:2021-05-31
down
wechat
bug