当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistical model for reproducibility in ranking-based feature selection
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2020-11-05 , DOI: 10.1007/s10115-020-01519-3
Ari Urkullu , Aritz Pérez , Borja Calvo

The stability of feature subset selection algorithms has become crucial in real-world problems due to the need for consistent experimental results across different replicates. Specifically, in this paper, we analyze the reproducibility of ranking-based feature subset selection algorithms. When applied to data, this family of algorithms builds an ordering of variables in terms of a measure of relevance. In order to quantify the reproducibility of ranking-based feature subset selection algorithms, we propose a model that takes into account all the different sized subsets of top-ranked features. The model is fitted to data through the minimization of an error function related to the expected values of Kuncheva’s consistency index for those subsets. Once it is fitted, the model provides practical information about the feature subset selection algorithm analyzed, such as a measure of its expected reproducibility or its estimated area under the receiver operating characteristic curve regarding the identification of relevant features. We test our model empirically using both synthetic and a wide range of real data. The results show that our proposal can be used to analyze feature subset selection algorithms based on rankings in terms of their reproducibility and their performance.



中文翻译:

基于排名的特征选择中可重复性的统计模型

特征子集选择算法的稳定性已成为现实世界中至关重要的问题,因为需要跨不同副本进行一致的实验结果。具体来说,在本文中,我们分析了基于排名的特征子集选择算法的可重复性。当应用于数据时,该系列算法根据相关性度量构建变量的排序。为了量化基于排名的特征子集选择算法的可重复性,我们提出了一种模型,该模型考虑了排名靠前的特征的所有不同大小的子集。该模型通过最小化与这些子集的Kuncheva一致性指标的预期值有关的误差函数来拟合数据。装好之后 该模型提供了有关所分析的特征子集选择算法的实用信息,例如在与相关特征识别有关的接收器工作特性曲线下对其预期重现性或其估计面积的度量。我们使用综合数据和大量实际数据进行经验测试。结果表明,我们的建议可用于基于特征再现性和性能方面的排名分析特征子集选择算法。

更新日期:2020-11-06
down
wechat
bug