当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Density-based weighting for imbalanced regression
Machine Learning ( IF 4.3 ) Pub Date : 2021-07-07 , DOI: 10.1007/s10994-021-06023-5
Michael Steininger 1 , Konstantin Kobs 1 , Padraig Davidson 1 , Anna Krause 1 , Andreas Hotho 1
Affiliation  

In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.



中文翻译:

用于不平衡回归的基于密度的加权

在许多现实世界中,不平衡的数据会阻碍学习算法(如神经网络)的模型性能,主要是在极少数情况下。这对于专注于这些罕见事件的任务尤其成问题。例如,在估算降水量时,极端降水事件很少,但考虑到其潜在后果很重要。虽然有许多经过充分研究的分类设置解决方案,但其中大多数不能轻易应用于回归。在回归任务的少数解决方案中,几乎没有人探索成本敏感学习,与分类任务中的基于采样的方法相比,成本敏感学习具有优势。在这项工作中,我们提出了一种用于不平衡回归数据集的样本加权方法,称为DenseWeight以及一种成本敏感的学习方法,用于具有不平衡数据的神经网络回归,称为DenseLoss基于我们的加权方案。DenseWeight 通过核密度估计 (KDE) 根据目标值稀有度对数据点进行加权。DenseLoss 根据 DenseWeight 调整每个数据点对损失的影响,相比普通数据点,稀有数据点对模型训练的影响更大。我们在多个不同分布的数据集上表明,DenseLoss 通过其基于密度的加权方案显着提高了稀有数据点的模型性能。此外,我们将 DenseLoss 与最先进的方法 SMOGN 进行比较,发现我们的方法主要产生更好的性能。我们的方法提供了对模型训练的更多控制,因为它使我们能够通过单个超参数主动决定关注常见或罕见情况之间的权衡,

更新日期:2021-07-08
down
wechat
bug