A Semisupervised Learning Scheme with Self-Paced Learning for Classifying Breast Cancer Histopathological Images,Computational Intelligence and Neuroscience

当前位置： X-MOL 学术 › Comput. Intell. Neurosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Semisupervised Learning Scheme with Self-Paced Learning for Classifying Breast Cancer Histopathological Images
Computational Intelligence and Neuroscience Pub Date : 2020-12-08 , DOI: 10.1155/2020/8826568
Sarpong Kwadwo Asare ₁ , Fei You ₁ , Obed Tettey Nartey ₂

Affiliation

The unavailability of large amounts of well-labeled data poses a significant challenge in many medical imaging tasks. Even in the likelihood of having access to sufficient data, the process of accurately labeling the data is an arduous and time-consuming one, requiring expertise skills. Again, the issue of unbalanced data further compounds the abovementioned problems and presents a considerable challenge for many machine learning algorithms. In lieu of this, the ability to develop algorithms that can exploit large amounts of unlabeled data together with a small amount of labeled data, while demonstrating robustness to data imbalance, can offer promising prospects in building highly efficient classifiers. This work proposes a semisupervised learning method that integrates self-training and self-paced learning to generate and select pseudolabeled samples for classifying breast cancer histopathological images. A novel pseudolabel generation and selection algorithm is introduced in the learning scheme to generate and select highly confident pseudolabeled samples from both well-represented classes to less-represented classes. Such a learning approach improves the performance by jointly learning a model and optimizing the generation of pseudolabels on unlabeled-target data to augment the training data and retraining the model with the generated labels. A class balancing framework that normalizes the class-wise confidence scores is also proposed to prevent the model from ignoring samples from less represented classes (hard-to-learn samples), hence effectively handling the issue of data imbalance. Extensive experimental evaluation of the proposed method on the BreakHis dataset demonstrates the effectiveness of the proposed method.

中文翻译：

一种具有自定进度学习的半监督学习方案，用于对乳腺癌组织病理学图像进行分类

大量标记良好的数据的不可用对许多医学成像任务构成了重大挑战。即使有可能获得足够的数据，准确标记数据的过程也是一项艰巨且耗时的过程，需要专业技能。同样，不平衡数据的问题进一步加剧了上述问题，并对许多机器学习算法提出了相当大的挑战。取而代之的是，开发算法的能力可以利用大量未标记数据和少量标记数据，同时证明对数据不平衡的鲁棒性，可以为构建高效分类器提供有希望的前景。这项工作提出了一种半监督学习方法，该方法将自我训练和自定进度学习相结合，以生成和选择用于对乳腺癌组织病理学图像进行分类的伪标记样本。在学习方案中引入了一种新的伪标签生成和选择算法，以从代表性良好的类别到代表性较低的类别生成和选择高度可信的伪标签样本。这种学习方法通过联合学习模型并优化未标记目标数据上伪标签的生成以增强训练数据并使用生成的标签重新训练模型来提高性能。还提出了一种对类置信度分数进行归一化的类平衡框架，以防止模型忽略来自较少代表类（难以学习的样本）的样本，从而有效地处理数据不平衡的问题。在 BreakHis 数据集上对所提出方法的广泛实验评估证明了所提出方法的有效性。

更新日期：2020-12-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11