当前位置: X-MOL 学术Neural Comput. & Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion
Neural Computing and Applications ( IF 4.5 ) Pub Date : 2020-06-05 , DOI: 10.1007/s00521-020-04971-y
Afef Ben Brahim

Feature selection is frequently used as a preprocessing step to data mining and is attracting growing attention due to the increasing amounts of data emerging from different domains. The large data dimensionality increases the noise and thus the error of learning algorithms. Filter methods for feature selection are specially very fast and useful for high-dimensional datasets. Existing methods focus on producing feature subsets that improve predictive performance, but they often suffer from instability. Instance-based filters, for example, are considered as one of the most effective methods that rank features based on instances neighborhood. However, as the feature weight fluctuates with the instances, small changes in training data result in a different selected subset of features. By another hand, some other filters generate stable results but lead to a modest predictive performance. The absence of a trade-off between stability and classification accuracy decreases the reliability of the feature selection results. In order to deal with this issue, we propose filter methods that improve stability of feature selection while preserving an optimal predictive accuracy and without increasing the complexity of the feature selection algorithms. The proposed approaches first use the strength of instance learning to identify initial sets of relevant features, and the advantage of aggregation techniques to increase the stability of the final set in a second stage. Two classification algorithms are used to evaluate the predictive performance of our proposed instance-based filters compared to state-of-the-art algorithms. The obtained results show the efficiency of our methods in improving both classification accuracy and feature selection stability for high-dimensional datasets.



中文翻译:

基于实例学习,冗余消除和有效子集融合的稳定特征选择

特征选择通常用作数据挖掘的预处理步骤,并且由于来自不同域的数据量越来越大,因此越来越受到关注。大数据维数增加了噪声,从而增加了学习算法的误差。用于特征选择的过滤方法特别快速且对高维数据集很有用。现有方法着重于产生可提高预测性能的特征子集,但它们经常遭受不稳定因素的困扰。例如,基于实例的过滤器被认为是根据实例邻域对要素进行排名的最有效方法之一。但是,随着特征权重随实例的变化而变化,训练数据的细微变化会导致特征的不同选择子集。另一方面 其他一些过滤器会产生稳定的结果,但会导致适度的预测性能。在稳定性和分类精度之间缺乏折衷会降低特征选择结果的可靠性。为了解决这个问题,我们提出了一种滤波方法,该方法可以提高特征选择的稳定性,同时保留最佳的预测精度,而又不会增加特征选择算法的复杂性。所提出的方法首先利用实例学习的优势来识别相关特征的初始集合,并利用聚合技术的优势来提高第二阶段最终集合的稳定性。与最先进的算法相比,使用两种分类算法来评估我们提出的基于实例的滤波器的预测性能。

更新日期:2020-06-05
down
wechat
bug