当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised soft-label feature selection
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2021-02-16 , DOI: 10.1016/j.knosys.2021.106847
Fei Wang , Lei Zhu , Jingjing Li , Haibao Chen , Huaxiang Zhang

Unsupervised feature selection is an important task in various research fields. It is difficult to select the discriminative features under unsupervised scenario due to the absence of label guidance. Recent works employ the pseudo labels to guide feature selection. However, they generate pseudo labels from the original feature space, where noises, redundancies and outliers may degrade the quality of pseudo labels. Besides, they ignore data fuzziness and use hard-labels as the semantic supervision of feature selection, thus the selected features suffer from significant information loss and semantic shortage. To tackle these problems, we propose an effective Unsupervised Soft-label Feature Selection (USFS) model, which performs soft-label learning and simultaneously guides the unsupervised feature selection process with the learned soft-labels. Specifically, we transform the data to low-dimensional subspace where the affinity matrix with sparse constraint is learned based on the local distances. The affinity matrix is determined as the soft-label matrix and further employed to guide the ultimate feature selection process. A simple yet efficient optimization method is derived to iteratively solve the formulated problem. Promising experimental results on widely tested benchmarks demonstrate the superiority of the proposed method compared with state-of-the-art approaches. For the purpose of reproducibility, we provide the code and testing datasets at https://github.com/wang-feifei/USFS-code.



中文翻译:

无监督的软标签特征选择


无监督特征选择是各个研究领域的重要任务。由于没有标签指导,很难在无监督的情况下选择区分特征。最近的工作采用伪标签来指导特征选择。但是,它们会从原始特征空间生成伪标签,其中的噪声,冗余和离群值可能会降低伪标签的质量。此外,他们忽略了数据的模糊性,并使用硬标签作为特征选择的语义监督,从而使选定的特征遭受大量信息丢失和语义短缺的困扰。为了解决这些问题,我们提出了一种有效的无监督软标签特征选择(USFS)模型,该模型执行软标签学习,并同时通过学习的软标签指导无监督特征选择过程。具体来说,我们将数据转换为低维子空间,在该空间中基于局部距离学习具有稀疏约束的亲和力矩阵。亲和度矩阵被确定为软标签矩阵,并进一步用于指导最终特征选择过程。推导了一种简单而有效的优化方法来迭代地解决所提出的问题。在广泛测试的基准上有希望的实验结果表明,与最先进的方法相比,该方法具有优越性。为了重现性,我们在https://github.com/wang-feifei/USFS-code提供了代码和测试数据集。亲和度矩阵被确定为软标签矩阵,并进一步用于指导最终特征选择过程。推导了一种简单而有效的优化方法来迭代地解决所提出的问题。在广泛测试的基准上有希望的实验结果表明,与最先进的方法相比,该方法具有优越性。为了重现性,我们在https://github.com/wang-feifei/USFS-code提供了代码和测试数据集。亲和度矩阵被确定为软标签矩阵,并进一步用于指导最终特征选择过程。推导了一种简单而有效的优化方法来迭代地解决所提出的问题。在广泛测试的基准上有希望的实验结果表明,与最先进的方法相比,该方法具有优越性。为了重现性,我们在https://github.com/wang-feifei/USFS-code提供了代码和测试数据集。

更新日期:2021-03-01
down
wechat
bug