Transfer Learning for Class Imbalance Problems with Inadequate Data.,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Transfer Learning for Class Imbalance Problems with Inadequate Data.
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2015-08-25 , DOI: 10.1007/s10115-015-0870-3
Samir Al-Stouhi ₁ , Chandan K Reddy ₂

Affiliation

A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.

中文翻译：

数据不足的班级不平衡问题的转移学习。

数据挖掘中的一个基本问题是在存在倾斜的数据分布的情况下有效地构建健壮的分类器。类不平衡分类器专门针对偏斜的分布数据集进行训练。现有方法假定培训样本充足，这是构造有效分类器的基本前提。但是，当没有足够的数据可用时，由于类之间的分配不均，代表性分类算法的开发变得更加困难。我们提供了一个统一的框架，该框架将潜在地使用转移学习机制来利用辅助数据，并同时构建一个强大的分类器来解决在特定目标领域中很少训练样本的情况下出现的不平衡问题。当训练样本不足时，转移学习方法会使用辅助数据来增强学习效果，在本文中，我们将开发一种经过优化的方法，可以同时增强训练数据并在偏斜的数据集中诱导平衡。我们提出了一种具有标签依赖更新机制的新颖的基于Boosting的实例转移分类器，该机制同时补偿类不平衡并合并了来自辅助域的样本以改善分类。我们提供了我们方法的理论和经验验证，并适用于医疗保健和文本分类应用程序。我们提出了一种具有标签依赖更新机制的新颖的基于Boosting的实例转移分类器，该机制同时补偿类不平衡并合并了来自辅助域的样本以改善分类。我们提供了我们方法的理论和经验验证，并适用于医疗保健和文本分类应用程序。我们提出了一种具有标签依赖更新机制的新颖的基于Boosting的实例转移分类器，该机制同时补偿类不平衡并合并了来自辅助域的样本以改善分类。我们提供了我们方法的理论和经验验证，并适用于医疗保健和文本分类应用程序。

更新日期：2015-08-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>