当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A New Representation in PSO for Discretization-Based Feature Selection
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2017-06-23 , DOI: 10.1109/tcyb.2017.2714145
Binh Tran , Bing Xue , Mengjie Zhang

In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity.

中文翻译:


基于离散化的特征选择的 PSO 的新表示



在机器学习中,离散化和特征选择(FS)是预处理数据以提高算法在高维数据上的性能的重要技术。由于许多 FS 方法需要离散数据,因此常见的做法是在 FS 之前应用离散化。此外,为了提高效率,特征通常被单独(或单变量)离散化。该方案的工作原理是基于每个特征独立影响任务的假设,这在存在特征交互的情况下可能不成立。因此,单变量离散化可能会降低 FS 阶段的性能,因为显示特征交互的信息可能会在离散化过程中丢失。我们之前提出的方法 [进化粒子群优化 (EPSO)] 的初步结果表明,使用基本粒子群优化 (BBPSO) 在单个阶段中结合离散化和 FS 可以比在两个单独的阶段中应用它们带来更好的性能。在本文中,我们提出了一种称为潜在粒子群优化(PPSO)的新方法,该方法采用可以减少问题搜索空间的新表示和新的适应度函数来更好地评估候选解决方案来指导搜索。在 10 个高维数据集上的结果表明,PPSO 选择的特征数量少于所有数据集的 5%。与在离散数据上使用 BBPSO 进行 FS 的两阶段方法相比,PPSO 在七个数据集上实现了显着更高的精度。此外,PPSO 在 8 个数据集上获得了比 EPSO 更好(或类似)的分类性能,而在 6 个数据集上选择的特征数量较少。 此外,PPSO 还优于三种比较的(传统)方法,并且在大多数数据集上的泛化能力和学习能力方面与一种方法相似。
更新日期:2017-06-23
down
wechat
bug