当前位置: X-MOL 学术IEEE Trans. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fuzzy Support Vector Machine With Relative Density Information for Classifying Imbalanced Data
IEEE Transactions on Fuzzy Systems ( IF 10.7 ) Pub Date : 2-9-2019 , DOI: 10.1109/tfuzz.2019.2898371
Hualong Yu , Changyin Sun , Xibei Yang , Shang Zheng , Haitao Zou

Fuzzy support vector machine (FSVM) has been combined with class imbalance learning (CIL) strategies to address the problem of classifying skewed data. However, the existing approaches hold several inherent drawbacks, causing the inaccurate prior data distribution estimation, further decreasing the quality of the classification model. To solve this problem, we present a more robust prior data distribution information extraction method named relative density, and two novel FSVM-CIL algorithms based on the relative density information in this paper. In our proposed algorithms, a K-nearest neighbors-based probability density estimation (KNN-PDE) alike strategy is utilized to calculate the relative density of each training instance. In particular, the relative density is irrelevant with the dimensionality of data distribution in feature space, but only reflects the significance of each instance within its class; hence, it is more robust than the absolute distance information. In addition, the relative density can better seize the prior data distribution information, no matter the data distribution is easy or complex. Even for the data with small injunctions or a large class overlap, the relative density information can reflect its details well. We evaluated the proposed algorithms on an amount of synthetic and real-world imbalanced datasets. The results show that our proposed algorithms obviously outperform to some previous work, especially on those datasets with sophisticated distributions.

中文翻译:


具有相对密度信息的模糊支持向量机对不平衡数据进行分类



模糊支持向量机(FSVM)与类不平衡学习(CIL)策略相结合来解决倾斜数据的分类问题。然而,现有的方法存在一些固有的缺陷,导致先验数据分布估计不准确,进一步降低了分类模型的质量。为了解决这个问题,本文提出了一种更鲁棒的先验数据分布信息提取方法,即相对密度,以及两种基于相对密度信息的新颖的 FSVM-CIL 算法。在我们提出的算法中,利用基于 K 最近邻的概率密度估计 (KNN-PDE) 类似策略来计算每个训练实例的相对密度。特别地,相对密度与特征空间中数据分布的维数无关,而仅反映每个实例在其类内的重要性;因此,它比绝对距离信息更稳健。此外,无论数据分布简单还是复杂,相对密度都可以更好地抓住先验数据分布信息。即使对于指令较小或类重叠较大的数据,相对密度信息也能很好地反映其细节。我们在大量合成和现实世界不平衡数据集上评估了所提出的算法。结果表明,我们提出的算法明显优于以前的一些工作,特别是在那些具有复杂分布的数据集上。
更新日期:2024-08-22
down
wechat
bug