当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An overlap sensitive neural network for class imbalanced data
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-05-18 , DOI: 10.1007/s10618-021-00766-4
Shaukat Ali Shahee , Usha Ananthakumar

Class imbalance is one of the well-known challenges in machine learning. Class imbalance occurs when one class dominates the other class in terms of the number of observations. Due to this imbalance, conventional classifiers fail to classify the minority class correctly. The challenges become even more severe when class overlap occurs in imbalanced data. Though literature is available to sequentially deal with class imbalance and class overlap, these methods are quite complex and not so efficient. In this paper, we propose an overlap-sensitive artificial neural network that can handle the problem of class overlapping and class imbalance simultaneously, along with noisy and outlier observations. The strength of this method lies in identifying the overlapping observations rather than the region and in not using multiple classifiers unlike the other existing methods. The key idea of the proposed method is in weighing the observations based on its location in the feature space before training the neural network. The performance of the proposed method is evaluated on 12 simulated data sets and 23 real-life data sets and compared with other well known methods.The results clearly indicate the strength and ability of the proposed method for a wide variety of imbalance ratio and levels of overlapping. Also, it is shown that the proposed method is statistically superior to the other methods in terms of different performance measures.



中文翻译:

用于类别不平衡数据的重叠敏感神经网络

班级不平衡是机器学习中众所周知的挑战之一。当一个类别在观察数量上占主导地位时,会出现类别不平衡。由于这种不平衡,常规分类器无法正确地对少数类进行分类。当在不平衡数据中发生类重叠时,挑战将变得更加严峻。尽管有文献可以依次解决班级不平衡和班级重叠问题,但这些方法非常复杂,效率不高。在本文中,我们提出了一种重叠敏感的人工神经网络,它可以同时处理类重叠和类不平衡的问题,以及嘈杂和异常的观察。该方法的优势在于识别重叠的观测值而不是区域,并且不像其他现有方法那样使用多个分类器。所提出方法的关键思想是在训练神经网络之前,根据观测值在特征空间中的位置对观测值进行加权。在12个模拟数据集和23个真实数据集上评估了该方法的性能,并与其他众所周知的方法进行了比较,结果清楚地表明了该方法在各种失衡率和水平上的优势和能力。重叠。此外,还表明,在不同的性能指标方面,所提出的方法在统计上优于其他方法。所提出方法的关键思想是在训练神经网络之前,根据观测值在特征空间中的位置对观测值进行加权。在12个模拟数据集和23个真实数据集上评估了该方法的性能,并与其他众所周知的方法进行了比较,结果清楚地表明了该方法在各种失衡率和水平上的优势和能力。重叠。此外,还表明,在不同的性能指标方面,所提出的方法在统计上优于其他方法。所提出方法的关键思想是在训练神经网络之前,根据观测值在特征空间中的位置对观测值进行加权。在12个模拟数据集和23个真实数据集上评估了该方法的性能,并与其他众所周知的方法进行了比较,结果清楚地表明了该方法在各种失衡率和水平上的优势和能力。重叠。此外,还表明,在不同的性能指标方面,所提出的方法在统计上优于其他方法。结果清楚地表明了所提出的方法在各种不平衡率和重叠水平上的优势和能力。此外,还表明,在不同的性能指标方面,所提出的方法在统计上优于其他方法。结果清楚地表明了所提出的方法在各种不平衡率和重叠水平上的优势和能力。此外,还表明,在不同的性能指标方面,所提出的方法在统计上优于其他方法。

更新日期:2021-05-19
down
wechat
bug