当前位置: X-MOL 学术Med. Biol. Eng. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A weighted ensemble-based active learning model to label microarray data.
Medical & Biological Engineering & Computing ( IF 2.6 ) Pub Date : 2020-08-08 , DOI: 10.1007/s11517-020-02238-1
Rajonya De 1 , Anuran Chakraborty 1 , Agneet Chatterjee 1 , Ram Sarkar 1
Affiliation  

Classification of cancerous genes from microarray data is an important research area in bioinformatics. Large amount of microarray data are available, but it is very costly to label them. This paper proposes an active learning model, a semi-supervised classification approach, to label the microarray data using which predictions can be made even with lesser amount of labeled data. Initially, a pool of unlabeled instances is given from which some instances are randomly chosen for labeling. Successive selection of instances to be labeled from unlabeled pool is determined by selection algorithms. The proposed method is devised following an ensemble approach to combine the decisions of three classifiers in order to arrive at a consensus which provides a more accurate prediction of the class label to ensure that each individual classifier learns in an uncorrelated manner. Our method combines the heuristic techniques used by an active learning algorithm to choose training samples with the multiple learning paradigm attained by an ensemble to optimize the search space by choosing efficiently from an already sparse learning pool. On evaluating the proposed method on 10 microarray datasets, we achieve performance which is comparable with state-of-the-art methods. The code and datasets are given at https://github.com/anuran-Chakraborty/Active-learning.

Flowchart of the proposed ensemble-based active learning framework



中文翻译:

一种基于加权集成的主动学习模型,用于标记微阵列数据。

从微阵列数据中对癌基因进行分类是生物信息学的一个重要研究领域。大量的微阵列数据是可用的,但标记它们的成本非常高。本文提出了一种主动学习模型,一种半监督分类方法,用于标记微阵列数据,即使使用较少量的标记数据也可以进行预测。最初,给定一个未标记实例池,从中随机选择一些实例进行标记。从未标记池中连续选择要标记的实例由选择算法确定。所提出的方法是按照集成方法设计的,以结合三个分类器的决策,以达成共识,提供更准确的类标签预测,以确保每个单独的分类器以不相关的方式学习。我们的方法将主动学习算法用于选择训练样本的启发式技术与集成获得的多重学习范式结合起来,通过从已经稀疏的学习池中进行有效选择来优化搜索空间。在 10 个微阵列数据集上评估所提出的方法时,我们实现了与最先进的方法相当的性能。代码和数据集在 https://github.com/anuran-Chakraborty/Active-learning 给出。我们的方法将主动学习算法用于选择训练样本的启发式技术与集成获得的多重学习范式结合起来,通过从已经稀疏的学习池中进行有效选择来优化搜索空间。在 10 个微阵列数据集上评估所提出的方法时,我们实现了与最先进的方法相当的性能。代码和数据集在 https://github.com/anuran-Chakraborty/Active-learning 给出。我们的方法将主动学习算法用于选择训练样本的启发式技术与集成获得的多重学习范式结合起来,通过从已经稀疏的学习池中进行有效选择来优化搜索空间。在 10 个微阵列数据集上评估所提出的方法时,我们实现了与最先进的方法相当的性能。代码和数据集在 https://github.com/anuran-Chakraborty/Active-learning 给出。

提出的基于集成的主动学习框架的流程图

更新日期:2020-08-09
down
wechat
bug