当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection for label distribution learning via feature similarity and label correlation
Information Sciences ( IF 8.1 ) Pub Date : 2021-08-24 , DOI: 10.1016/j.ins.2021.08.076
Wenbin Qian 1 , Yinsong Xiong 1 , Jun Yang 1 , Wenhao Shu 2
Affiliation  

Feature selection plays a crucial role in machine learning and data mining, and improves the performance of learning models by selecting a distinguishing feature subset and eliminating irrelevant features. Existing feature selection methods are mainly used for single-label learning and multi-label learning; however, there are only a few feature selection methods for label distribution learning. Label distribution learning has the “curse of dimensionality” problem, similar to that in multi-label learning. In label distribution learning, the related labels of each sample have different levels of importance. Therefore, multi-label feature selection algorithms can not be directly applied to label distribution data, and discretizing the label distribution data into multi-label data would result in the loss of some important supervised information. To solve this problem, a novel feature selection algorithm for label distribution learning is proposed in this paper. The proposed method utilizes neighborhood granularity to explore feature similarity, and it uses a correlation coefficient to generate the label correlations. In addition, sparse learning is used to improve the robustness and control complexity. Experimental results indicate that our proposed method is more effective than five state-of-art feature selection algorithms on twelve datasets, with respect to six representative evaluation measures.



中文翻译:

通过特征相似性和标签相关性进行标签分布学习的特征选择

特征选择在机器学习和数据挖掘中起着至关重要的作用,它通过选择有区别的特征子集和消除不相关的特征来提高学习模型的性能。现有的特征选择方法主要用于单标签学习和多标签学习;然而,标签分布学习的特征选择方法很少。标签分布学习存在“维数诅咒”问题,类似于多标签学习。在标签分布学习中,每个样本的相关标签具有不同的重要性级别。因此,多标签特征选择算法不能直接应用于标签分布数据,将标签分布数据离散为多标签数据会导致一些重要的监督信息丢失。为了解决这个问题,本文提出了一种新的标签分布学习特征选择算法。所提出的方法利用邻域粒度来探索特征相似性,并使用相关系数来生成标签相关性。此外,稀疏学习用于提高鲁棒性和控制复杂度。实验结果表明,我们提出的方法在 12 个数据集上比 5 种最先进的特征选择算法更有效,在 6 个有代表性的评估措施方面。

更新日期:2021-09-16
down
wechat
bug