当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Least Loss: A simplified filter method for feature selection
Information Sciences Pub Date : 2020-05-19 , DOI: 10.1016/j.ins.2020.05.017
Fadi Thabtah , Firuz Kamalov , Suhel Hammoud , Seyed Reza Shahamiri

Identifying the relevant set of features in a dataset is an important part of data analytics. Discarding significant variables or keeping irrelevant variables has significant effects on the performance of the learning algorithm during knowledge discovery. In this paper, a feature selection method called Least Loss (L2) is proposed that significantly reduces the dimensionality of data by disposing weakly correlated variables in a robust manner without diminishing the predictive performance of classifiers. The proposed method is based on quantifying the similarity between the observed and expected probabilities and generating scores for each independent variable, which makes it simple and intuitive. The evaluation of the proposed method was done by comparing its performance against Information Gain (IG) and Chi Square (CHI) feature selection methods on 27 different datasets modeled using a probabilistic classifier. The results reveal that L2 is highly competitive with respect to error rate, precision, and recall measures while substantially reducing the number of selected variables in the datasets. Our study would be of high interest to data analysts, scholars and domain experts who deal with applications that include large numbers of features using statistical analysis methods.



中文翻译:

最小损失:用于特征选择的简化过滤方法

识别数据集中的一组相关特征是数据分析的重要组成部分。在知识发现过程中,丢弃重要变量或保留无关变量对学习算法的性能具有重要影响。在本文中,一种称为最小损失(L 2)建议以健壮的方式放置弱相关变量而不会降低分类器的预测性能,从而显着降低数据的维数。所提出的方法基于量化观察到的概率与预期概率之间的相似度,并为每个自变量生成分数,这使其变得简单直观。通过将其性能与信息增益(IG)和卡方(CHI)特征选择方法的性能进行比较,对使用概率分类器建模的27个不同数据集进行了评估。结果表明,L 2在错误率,精度和召回率方面具有很高的竞争力,同时大大减少了数据集中所选变量的数量。数据分析人员,学者和领域专家将非常关注我们的研究,他们使用统计分析方法处理包含大量功能的应用程序。

更新日期:2020-05-19
down
wechat
bug