当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble-based active learning using fuzzy-rough approach for cancer sample classification
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2020-03-09 , DOI: 10.1016/j.engappai.2020.103591
Ansuman Kumar , Anindya Halder

Background and Objective: Classification of cancer from gene expression data is one of the major research areas in the field of machine learning and medical science. Generally, conventional supervised methods are not able to produce desired classification accuracy due to inadequate training samples present in gene expression data to train the system. Ensemble-based active learning technique in this situation can be effective as it determines few informative samples by all the base classifiers and ensemble the decisions of all the base classifiers to get the most informative samples. Most informative samples are labeled by the subject experts and those are added to the training set, which can improve the classification accuracy.

Method: We propose a novel ensemble-based active learning using fuzzy-rough approach for cancer sample classification from microarray gene expression data. The proposed method is able to deal with the uncertainty, overlap and indiscernibility usually present in the subtype classes of the gene expression data and can improve the accuracy of the individual base classifier in presence of limited training samples.

Results: The proposed method is validated using eight microarray gene expression datasets. The performance of the proposed method in terms of classification accuracy, precision, recall, F1-measures and kappa is compared with six other methods. The improvements in accuracy achieved by the proposed method compared to its nearest competitive methods are 2.96%, 9.34%, 0.93%, 3.69%, 7.2% and 4.53% respectively for Colon cancer, Prostate cancer, SRBCT, Ovarian cancer, DLBCL and Central nervous system datasets. Results of the paired t-test justify the statistical relevance of the results in favor of the proposed method for most of the datasets.

Conclusion: The proposed method is an effective general purpose ensemble-based active learning adopting the fuzzy-rough concept and therefore can be applied for other classification problem in future.



中文翻译:

基于集合的主动学习,采用模糊粗糙方法进行癌症样本分类

背景与目的:从基因表达数据分类癌症是机器学习和医学领域的主要研究领域之一。通常,由于在基因表达数据中存在不足以训练系统的训练样本,常规的监督方法不能产生期望的分类精度。在这种情况下,基于集成的主动学习技术可能是有效的,因为它可以由所有基本分类器确定很少的信息样本,并且集合所有基本分类器的决策以获取最多的信息样本。大多数信息样本由主题专家标记,然后将其添加到训练集中,这可以提高分类准确性。

方法:我们提出了一种新的基于集合的主动学习,使用模糊粗糙方法从微阵列基因表达数据中对癌症样本进行分类。所提出的方法能够处理通常存在于基因表达数据的亚型类别中的不确定性,重叠和不可区分性,并且能够在训练样本有限的情况下提高单个碱基分类器的准确性。

结果:使用8个微阵列基因表达数据集验证了该方法的有效性。所提方法在分类准确度,精确度,召回率,F1个-措施和kappa与其他六种方法进行比较。与最接近的竞争方法相比,该方法在结肠癌,前列腺癌,SRBCT,卵巢癌,DLBCL和中枢神经系统中的准确性分别提高了2.96%,9.34%,0.93%,3.69%,7.2%和4.53%。系统数据集。配对结果Ť-test证明了结果的统计相关性,从而对大多数数据集都支持所建议的方法。

结论:该方法是一种有效的基于模糊集合概念的基于通用集成的主动学习方法,因此可以在将来应用于其他分类问题。

更新日期:2020-03-09
down
wechat
bug