当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probability Density Machine: A New Solution of Class Imbalance Learning
Scientific Programming Pub Date : 2021-09-09 , DOI: 10.1155/2021/7555587
Ruihan Cheng 1 , Longfei Zhang 1 , Shiqi Wu 1 , Sen Xu 2 , Shang Gao 1, 3 , Hualong Yu 1, 3
Affiliation  

Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.

中文翻译:

概率密度机:类不平衡学习的新解决方案

类不平衡学习(CIL)是机器学习的一个重要分支,因为一般来说,分类模型很难从不平衡的数据中学习;同时,在各种实际应用中经常存在倾斜的数据分布。在本文中,我们介绍了一种称为概率密度机 (PDM) 的 CIL 新解决方案。首先,在高斯朴素贝叶斯(GNB)预测模型的背景下,我们从理论上分析了数据分布不平衡导致预测模型性能下降的原因,并得出仅与先验概率相关的类不平衡的影响的结论,但与训练数据的条件概率无关。然后,在这样的背景下,我们展示了几种传统 CIL 技术的合理性。此外,我们指出了将 GNB 与这些传统 CIL 技术相结合的缺点。接下来,利用 K-近邻概率密度估计 (KNN-PDE) 的思想,我们提出了 PDM,这是一种改进的基于 GNB 的 CIL 算法。最后,我们在大量的类不平衡数据集上进行了实验,所提出的 PDM 算法显示出有希望的结果。
更新日期:2021-09-09
down
wechat
bug