Binary classification with ambiguous training data,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Binary classification with ambiguous training data
Machine Learning ( IF 7.5 ) Pub Date : 2020-11-03 , DOI: 10.1007/s10994-020-05915-2
Naoya Otani , Yosuke Otsubo , Tetsuya Koike , Masashi Sugiyama

In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts. In this paper, we consider a binary classification problem in the presence of such A samples. This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples. Also, it is different from 3-class classification with the positive (P), negative (N), and A classes since we do not want to classify test samples into the A class. Our proposed method extends binary classification with reject option, which trains a classifier and a rejector simultaneously using P and N samples based on the 0-1-$c$ loss with rejection cost $c$. More specifically, we propose to train a classifier and a rejector under the 0-1-$c$-$d$ loss using P, N, and A samples, where $d$ is the misclassification penalty for ambiguous samples. In our practical implementation, we use a convex upper bound of the 0-1-$c$-$d$ loss for computational tractability. Numerical experiments demonstrate that our method can successfully utilize the additional information brought by such A training data.

中文翻译：

具有模糊训练数据的二元分类

在监督学习中，我们经常面临模糊 (A) 样本，即使是领域专家也难以对其进行标注。在本文中，我们考虑存在此类 A 样本的二元分类问题。这个问题与半监督学习有很大不同，因为未标记的样本不一定是困难的样本。此外，它与具有正 (P)、负 (N) 和 A 类的 3 类分类不同，因为我们不想将测试样本分类为 A 类。我们提出的方法扩展了带有拒绝选项的二元分类，该方法基于 0-1-$c$ 损失和拒绝成本 $c$ 同时使用 P 和 N 样本训练分类器和拒绝器。更具体地说，我们建议使用 P、N 和 A 样本在 0-1-$c$-$d$ 损失下训练分类器和拒绝器，其中 $d$ 是模糊样本的错误分类惩罚。在我们的实际实现中，我们使用 0-1-$c$-$d$ 损失的凸上界来计算易处理性。数值实验表明，我们的方法可以成功利用此类 A 训练数据带来的附加信息。

更新日期：2020-11-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>