当前位置: X-MOL 学术IEEE Trans. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BSSReduce an O(|U|) incremental feature selection approach for large-scale and high dimensional data
IEEE Transactions on Fuzzy Systems ( IF 11.9 ) Pub Date : 2018-12-01 , DOI: 10.1109/tfuzz.2018.2825308
Ke Gong , Yong Wang , Maozeng Xu , Zhi Xiao

With the advent of the era of big data, data has become bigger than ever. Recently, as a fundamental task of pattern recognition, predict and data mining, feature selection has aroused wide public concern. However, extant methods on feature selection have an $O(\left|C\right|^x\left|U\right|^y)$ time complexity, which is the bottleneck preventing people from exploring knowledge in large-scale or high-dimensional datasets. Based on bijective soft sets, we propose a new rationale for feature selection, which can help break that bottleneck. Subsequently, this paper proposes an $O(\left|U\right|)$ feature-selection method whose computational time increases linearly only with the number of instances. To validate the proposed method, we conduct extensive experiments on the University of California Irvine (UCI) datasets in which large-scale and high-dimensional datasets containing four million instances and over three million features are included. The results reveal that the proposed method is an efficient, effective, and outperforms traditional methods in runtime, which can save massive computing resources. Moreover, the proposed method can be applied to feature selection for large-scale and gigantic-dimensional datasets, which are difficult to process with traditional methods.

中文翻译:

BSSR减少大规模高维数据的O(|U|)增量特征选择方法

随着大数据时代的到来,数据变得前所未有的庞大。近年来,作为模式识别、预测和数据挖掘的一项基本任务,特征选择引起了公众的广泛关注。然而,现有的特征选择方法具有 $O(\left|C\right|^x\left|U\right|^y)$ 的时间复杂度,这是阻止人们大规模或高水平探索知识的瓶颈。维数据集。基于双射软集,我们提出了一个新的特征选择原理,这有助于打破瓶颈。随后,本文提出了$O(\left|U\right|)$ 特征选择方法,其计算时间仅随实例数呈线性增加。为了验证所提出的方法,我们对加州大学欧文分校 (UCI) 数据集进行了大量实验,其中包括包含 400 万个实例和超过 300 万个特征的大规模和高维数据集。结果表明,所提出的方法是一种高效、有效的方法,并且在运行时优于传统方法,可以节省大量计算资源。此外,所提出的方法可以应用于传统方法难以处理的大规模和巨维数据集的特征选择。
更新日期:2018-12-01
down
wechat
bug