当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Greedy Search Algorithms for Unsupervised Variable Selection: A Comparative Study
arXiv - CS - Machine Learning Pub Date : 2021-03-03 , DOI: arxiv-2103.02687
Federico Zocco, Marco Maggipinto, Gian Antonio Susto, Seán McLoone

Dimensionality reduction is a important step in the development of scalable and interpretable data-driven models, especially when there are a large number of candidate variables. This paper focuses on unsupervised variable selection based dimensionality reduction, and in particular on unsupervised greedy selection methods, which have been proposed by various researchers as computationally tractable approximations to optimal subset selection. These methods are largely distinguished from each other by the selection criterion adopted, which include squared correlation, variance explained, mutual information and frame potential. Motivated by the absence in the literature of a systematic comparison of these different methods, we present a critical evaluation of seven unsupervised greedy variable selection algorithms considering both simulated and real world case studies. We also review the theoretical results that provide performance guarantees and enable efficient implementations for certain classes of greedy selection function, related to the concept of submodularity. Furthermore, we introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm. Our experimental results show that: (1) variance explained and mutual information based selection methods yield smaller approximation errors than frame potential; (2) the lazy FSCA implementation has similar performance to FSCA, while being an order of magnitude faster to compute, making it the algorithm of choice for unsupervised variable selection.

中文翻译:

无监督变量选择的贪婪搜索算法:一个比较研究

降维是开发可扩展且可解释的数据驱动模型的重要一步,尤其是在存在大量候选变量的情况下。本文着重于基于无监督变量选择的降维,尤其是针对无监督贪婪选择方法,这些方法已被众多研究人员提出,它们是最优子集选择的可计算的近似方法。这些方法在很大程度上通过采用的选择标准来区分,这些选择标准包括平方相关,解释的方差,互信息和帧电位。由于文献中缺乏对这些不同方法的系统比较的动机,考虑到模拟案例和现实案例,我们对7种无监督的贪婪变量选择算法进行了重要评估。我们还回顾了与子模态概念相关的理论结果,这些理论结果可为某些类别的贪婪选择函数提供性能保证并实现有效的实现。此外,我们首次引入并评估了基于前向选择分量分析(FSCA)算法的方差解释的惰性实现。我们的实验结果表明:(1)解释方差和基于互信息的选择方法产生的近似误差小于帧势。(2)懒惰的FSCA实现与FSCA具有相似的性能,但计算速度要快一个数量级,
更新日期:2021-03-05
down
wechat
bug