当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A concise method for feature selection via normalized frequencies
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05814
Song Tan, Xia He

Feature selection is an important part of building a machine learning model. By eliminating redundant or misleading features from data, the machine learning model can achieve better performance while reducing the demand on com-puting resources. Metaheuristic algorithms are mostly used to implement feature selection such as swarm intelligence algorithms and evolutionary algorithms. However, they suffer from the disadvantage of relative complexity and slowness. In this paper, a concise method is proposed for universal feature selection. The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them. In the method, one-hoting encoding is used to preprocess the dataset, and random forest is utilized as the classifier. The proposed method uses normalized frequencies to assign a value to each feature, which will be used to find the optimal feature subset. Furthermore, we propose a novel approach to exploit the outputs of mutual information, which allows for a better starting point for the experiments. Two real-world dataset in the field of intrusion detection were used to evaluate the proposed method. The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.

中文翻译:

一种通过归一化频率进行特征选择的简明方法

特征选择是构建机器学习模型的重要部分。通过消除数据中的冗余或误导性特征,机器学习模型可以在减少对计算资源的需求的同时获得更好的性能。元启发式算法主要用于实现特征选择,如群体智能算法和进化算法。然而,它们具有相对复杂和缓慢的缺点。在本文中,提出了一种用于通用特征选择的简洁方法。所提出的方法使用过滤方法和包装方法的融合,而不是它们的组合。该方法采用one-hoting编码对数据集进行预处理,利用随机森林作为分类器。所提出的方法使用归一化频率为每个特征分配一个值,这将用于找到最佳特征子集。此外,我们提出了一种利用互信息输出的新方法,这为实验提供了更好的起点。入侵检测领域的两个真实世界数据集用于评估所提出的方法。评估结果表明,所提出的方法在准确率、准确率、召回率、F-score 和 AUC 方面优于多项最先进的相关工作。
更新日期:2021-06-11
down
wechat
bug