当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Particle Swarm Optimization Based Swarm Intelligence for Active Learning Improvement: Application on Medical Data Classification
Cognitive Computation ( IF 5.4 ) Pub Date : 2020-08-05 , DOI: 10.1007/s12559-020-09739-z
Nawel Zemmal , Nabiha Azizi , Mokhtar Sellami , Soraya Cheriguene , Amel Ziani , Monther AlDwairi , Nadjette Dendani

Semi-supervised learning targets the common situation where labeled data are scarce but unlabeled data are abundant. It uses unlabeled data to help supervised learning tasks. In practice, it may make sense to utilize active learning in conjunction with semi-supervised learning. That is, we might allow the learning algorithm to pick a set of unlabeled instances to be labeled by a domain expert, which will then be used as the labeled data set. However, existing approaches are computationally expensive and require searching through an entire unlabeled dataset, which may contain redundant instances that provide no instructive information to the classifier and can decrease the performance. To address this optimization problem, a hybrid system that combines active learning (AL) and particle swarm optimization (PSO) algorithms is proposed to reduce the cost of labeling while building a more efficient classifier. The novelty of this work resides in the integration of a bio-inspired optimization algorithm in the machine learning strategy. Furthermore, a novel uncertainty measure was integrated into the particle swarm optimization algorithm as an objective function to select from massive amounts of medical instances those that are deemed most informative. To evaluate the effectiveness of the proposed approach, eighteen (18) benchmark datasets were used and compared against three best-known classifiers with different learning paradigms: AL–NB an active learning algorithm using Naïve Base classifier and Margin Sampling strategy, SVM (Support Vector Machine), ELM (Extreme Learning Machine) with supervised learning, and TSVM (Transductive Support Vector Machine) with the semi-supervised learning. Experiments showed that the proposed approach is effective in reducing the efforts required by experts for medical data annotation to produce an accurate classifier. The active learning approach has been utilized to optimize the expensive task of labeling. Based on a novel uncertainty measure, the nature-inspired algorithm PSO attempts to select from massive amounts of unlabeled medical instances those considered informative, at the same time improving the classifier performance. The experiments carried out confirm that the proposed strategy significantly enhances the performance of the AL algorithm compared with the commonly used uncertainty strategies. It achieves a performance similar to that of fully supervised and semi-supervised algorithms while requiring much less labeling. As a future extension of this work, it would be interesting to integrate other evolutionary optimization algorithms and compare them with our approach. In addition, it is beneficial to test the impact of using other variants of PSO algorithm in our approach. Also, it is aimed to test more classification algorithms in the experimentation process.

中文翻译:

基于粒子群优化的群体智能促进主动学习:在医学数据分类中的应用

半监督学习针对的是标记数据稀少而未标记数据丰富的常见情况。它使用未标记的数据来帮助监督学习任务。在实践中,将主动学习与半监督学习结合起来可能是有意义的。也就是说,我们可能允许学习算法选择一组要由域专家标记的未标记实例,然后将其用作标记数据集。但是,现有方法在计算上很昂贵,并且需要搜索整个未标记的数据集,该数据集可能包含冗余实例,这些实例无法为分类器提供任何指导性信息,并且会降低性能。为了解决此优化问题,提出了一种结合了主动学习(AL)和粒子群优化(PSO)算法的混合系统,以降低标记成本,同时构建更有效的分类器。这项工作的新颖性在于将生物启发式优化算法集成到机器学习策略中。此外,将一种新颖的不确定性度量集成到了粒子群优化算法中,作为一种目标函数,可以从大量的医疗实例中选择那些被认为是最多的实例。内容丰富。为了评估所提出方法的有效性,使用了十八(18)个基准数据集,并将其与具有不同学习范式的三个最著名分类器进行了比较:AL–NB使用朴素基础分类器和边距采样策略的主动学习算法,SVM(支持向量机),具有监督学习功能的ELM(极限学习机)和TSVM(超导支持向量机)具有半监督学习。实验表明,提出的方法可有效减少专家对医学数据标注产生准确分类器所需的工作。主动学习方法已被用来优化贴标的昂贵任务。基于一种新颖的不确定性度量,自然启发式算法PSO尝试从大量未标记的医疗实例中选择那些被认为具有参考价值的实例,同时提高了分类器的性能。进行的实验证实,与常用的不确定性策略相比,该策略显着提高了AL算法的性能。它实现了与完全监督和半监督算法相似的性能,同时所需的标签更少。作为这项工作的未来扩展,集成其他进化优化算法并将其与我们的方法进行比较将很有趣。此外,在我们的方法中测试使用PSO算法其他变体的影响是有益的。同样,它的目的是在实验过程中测试更多的分类算法。
更新日期:2020-08-05
down
wechat
bug