Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction,Expert Systems with Applications

当前位置： X-MOL 学术 › Expert Syst. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-02-04 , DOI: 10.1016/j.eswa.2020.113277
Mengmeng Li , Haofeng Wang , Lifang Yang , You Liang , Zhigang Shang , Hong Wan

Dimensionality reduction is one basic and critical technology for data mining, especially in current “big data” era. As two different types of methods, feature selection and feature extraction each have their pros and cons. In this paper, we combine multi-strategy feature selection and grouped feature extraction and propose a novel fast hybrid dimension reduction method, incorporating their advantages of removing irrelevant and redundant information. Firstly, the intrinsic dimensionality of the data set is estimated by the maximum likelihood estimation method. Fisher Score and Information Gain based feature selection are used as multi-strategy methods to remove irrelevant features. With the redundancy among the selected features as clustering criterion, they are grouped into a certain amount of clusters. In every cluster, Principal Component Analysis (PCA) based feature extraction is carried out to remove redundant information. Four classical classifiers and representation entropy are used to evaluate the classification performance and information loss of the reduced set. The runtime results of different methods show that the proposed hybrid method is consistently much faster than the other three in almost all of the sets used. Meanwhile, the proposed method shows competitive classification performance, which has no significant difference basically compared with the other methods. The proposed method reduces the dimensionality of the raw data fast and it has excellent efficiency and competitive classification performance compared with the contrastive methods.

中文翻译：

基于特征选择和分组特征提取的快速混合降维分类方法

降维是数据挖掘的一项基本和关键技术，尤其是在当前的“大数据”时代。作为两种不同类型的方法，特征选择和特征提取各有其优缺点。在本文中，我们将多策略特征选择和分组特征提取相结合，提出了一种新颖的快速混合降维方法，结合了它们去除无关信息和冗余信息的优势。首先，通过最大似然估计方法估计数据集的固有维数。基于Fisher分数和信息增益的特征选择被用作多策略方法，以删除不相关的特征。将所选特征之间的冗余度作为聚类标准，将它们分为一定数量的聚类。在每个集群中进行基于主成分分析（PCA）的特征提取以删除冗余信息。使用四个经典分类器和表示熵来评估分类性能和简化集的信息损失。不同方法的运行时间结果表明，在几乎所有使用的集合中，所提出的混合方法始终比其他三种方法快得多。同时，所提出的方法具有竞争性的分类性能，与其他方法相比基本没有显着差异。与对比方法相比，该方法快速降低了原始数据的维数，具有优良的效率和竞争性的分类性能。使用四个经典分类器和表示熵来评估分类性能和简化集的信息损失。不同方法的运行时间结果表明，在几乎所有使用的集合中，所提出的混合方法始终比其他三个方法快得多。同时，所提出的方法具有竞争性的分类性能，与其他方法相比基本没有显着差异。与对比方法相比，该方法快速降低了原始数据的维数，具有优良的效率和竞争性的分类性能。使用四个经典分类器和表示熵来评估分类性能和简化集的信息损失。不同方法的运行时间结果表明，在几乎所有使用的集合中，所提出的混合方法始终比其他三个方法快得多。同时，该方法具有竞争性的分类性能，与其他方法相比，基本没有显着差异。与对比方法相比，该方法快速降低了原始数据的维数，具有优异的效率和竞争性的分类性能。不同方法的运行时间结果表明，在几乎所有使用的集合中，所提出的混合方法始终比其他三个方法快得多。同时，所提出的方法具有竞争性的分类性能，与其他方法相比基本没有显着差异。与对比方法相比，该方法快速降低了原始数据的维数，具有优异的效率和竞争性的分类性能。不同方法的运行时间结果表明，在几乎所有使用的集合中，所提出的混合方法始终比其他三个方法快得多。同时，所提出的方法具有竞争性的分类性能，与其他方法相比基本没有显着差异。与对比方法相比，该方法快速降低了原始数据的维数，具有优良的效率和竞争性的分类性能。

更新日期：2020-02-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11