Feature selection using autoencoders with Bayesian methods to high-dimensional data,Journal of Intelligent & Fuzzy Systems

当前位置： X-MOL 学术 › J. Intell. Fuzzy Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature selection using autoencoders with Bayesian methods to high-dimensional data
Journal of Intelligent & Fuzzy Systems ( IF 2 ) Pub Date : 2021-08-17 , DOI: 10.3233/jifs-211348
Lei Shu ₁ , Kun Huang ₂ , Wenhao Jiang ₁ , Wenming Wu ₁ , Hongling Liu ₁

Affiliation

It is easy to lead to poor generalization in machine learning tasks using real-world data directly, since such data is usually high-dimensional dimensionality and limited. Through learning the low dimensional representations of high-dimensional data, feature selection can retain useful features formachine learning tasks. Using these useful features effectively trains machine learning models. Hence, it is a challenge for feature selection from high-dimensional data. To address this issue, in this paper, a hybrid approach consisted of an autoencoder and Bayesian methods is proposed for a novel feature selection. Firstly, Bayesian methods are embedded in the proposed autoencoder as a special hidden layer. This of doing is to increase the precision during selecting non-redundant features. Then, the other hidden layers of the autoencoder are used for non-redundant feature selection. Finally, compared with the mainstream approaches for feature selection, the proposed method outperforms them. We find that the way consisted of autoencoders and probabilistic correction methods is more meaningful than that of stacking architectures or adding constraints to autoencoders as regards feature selection. We also demonstrate that stacked autoencoders are more suitable for large-scale feature selection, however, sparse autoencoders are beneficial for a smaller number of feature selection. We indicate that the value of the proposed method provides a theoretical reference to analyze the optimality of feature selection.

中文翻译：

使用带有贝叶斯方法的自动编码器对高维数据进行特征选择

直接使用真实世界数据的机器学习任务很容易导致泛化不良，因为这些数据通常是高维的并且是有限的。通过学习高维数据的低维表示，特征选择可以为机器学习任务保留有用的特征。使用这些有用的功能可以有效地训练机器学习模型。因此，从高维数据中选择特征是一个挑战。为了解决这个问题，在本文中，提出了一种由自动编码器和贝叶斯方法组成的混合方法，用于新的特征选择。首先，贝叶斯方法作为一个特殊的隐藏层被嵌入到所提出的自动编码器中。这样做是为了提高选择非冗余特征时的精度。然后，自编码器的其他隐藏层用于非冗余特征选择。最后，与主流的特征选择方法相比，所提出的方法优于它们。我们发现，在特征选择方面，由自动编码器和概率校正方法组成的方式比堆叠架构或向自动编码器添加约束更有意义。我们还证明了堆叠自编码器更适合大规模特征选择，然而，稀疏自编码器有利于较少数量的特征选择。我们表明该方法的价值为分析特征选择的最优性提供了理论参考。所提出的方法优于他们。我们发现，在特征选择方面，由自动编码器和概率校正方法组成的方式比堆叠架构或向自动编码器添加约束更有意义。我们还证明了堆叠自编码器更适合大规模特征选择，然而，稀疏自编码器有利于较少数量的特征选择。我们表明该方法的价值为分析特征选择的最优性提供了理论参考。所提出的方法优于他们。我们发现，在特征选择方面，由自动编码器和概率校正方法组成的方式比堆叠架构或向自动编码器添加约束更有意义。我们还证明了堆叠自编码器更适合大规模特征选择，然而，稀疏自编码器有利于较少数量的特征选择。我们表明该方法的价值为分析特征选择的最优性提供了理论参考。我们还证明了堆叠自编码器更适合大规模特征选择，然而，稀疏自编码器有利于较少数量的特征选择。我们表明该方法的价值为分析特征选择的最优性提供了理论参考。我们还证明了堆叠自编码器更适合大规模特征选择，然而，稀疏自编码器有利于较少数量的特征选择。我们表明该方法的价值为分析特征选择的最优性提供了理论参考。

更新日期：2021-08-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>