Towards an improved label noise proportion estimation in small data: a Bayesian approach,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards an improved label noise proportion estimation in small data: a Bayesian approach
International Journal of Machine Learning and Cybernetics ( IF 5.6 ) Pub Date : 2021-09-14 , DOI: 10.1007/s13042-021-01423-4
Jakramate Bootkrajang ₁ , Jeerayut Chaijaruwanich ₁

Affiliation

Today’s classification task is getting more and more complex. This inevitably renders unanticipated compromises on the quality of data labels. In this paper, we consider learning label noise robust classifiers with focus on the tasks with limited training examples relative to the number of data classes and data dimensionality. In such cases, the existing label noise models tend to inaccurately estimate the noise proportions leading to suboptimal performance. To alleviate the problem, we formulated a regularised label noise model capable of expressing preference on the noise parameters. In addition, we treated the regularisation from a Bayesian perspective so that the regularisation parameters can be inferred from the data through the noise model, thereby facilitating model selection in the presence of label noise. This results in a more data and computationally efficient Bayesian label noise model which could be incorporated into any probabilistic classifier, including those that are known to be data intensive such as deep neural networks. We demonstrated the generality of the proposed method through its integrations with logistic regression, multinomial logistic regression and convolutional neural networks. Extensive empirical evaluations demonstrate that the proposed regularised label noise model can significantly improve, in terms of both the quality of noise parameters estimation and the classification accuracy, upon the existing ones when data is scarce, and is no worse than the existing approaches in the abundance of training data.

中文翻译：

在小数据中改进标签噪声比例估计：贝叶斯方法

今天的分类任务越来越复杂。这不可避免地会对数据标签的质量造成意想不到的妥协。在本文中，我们考虑学习标签噪声鲁棒分类器，重点是与数据类数量和数据维度相关的训练示例有限的任务。在这种情况下，现有的标签噪声模型往往会不准确地估计噪声比例，从而导致性能欠佳。为了缓解这个问题，我们制定了一个正则化标签噪声模型，能够表达对噪声参数的偏好。此外，我们从贝叶斯的角度处理正则化，以便可以通过噪声模型从数据中推断出正则化参数，从而在存在标签噪声的情况下促进模型选择。这会产生更多数据和计算效率更高的贝叶斯标签噪声模型，该模型可以合并到任何概率分类器中，包括已知数据密集型的分类器，例如深度神经网络。我们通过与逻辑回归、多项逻辑回归和卷积神经网络的集成证明了所提出方法的通用性。大量的实证评估表明，所提出的正则化标签噪声模型可以在噪声参数估计的质量和分类精度方面显着提高，当数据稀缺时，不比现有方法差。的训练数据。包括那些众所周知的数据密集型，例如深度神经网络。我们通过与逻辑回归、多项逻辑回归和卷积神经网络的集成证明了所提出方法的通用性。大量的实证评估表明，所提出的正则化标签噪声模型可以在噪声参数估计的质量和分类精度方面显着提高，当数据稀缺时，不比现有方法差。的训练数据。包括那些众所周知的数据密集型，例如深度神经网络。我们通过与逻辑回归、多项逻辑回归和卷积神经网络的集成证明了所提出方法的通用性。大量的实证评估表明，所提出的正则化标签噪声模型可以在噪声参数估计的质量和分类精度方面显着提高，当数据稀缺时，不比现有方法差。的训练数据。

更新日期：2021-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>