Robust Feature Selection Technique Using Rank Aggregation,Applied Artificial Intelligence

当前位置： X-MOL 学术 › Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust Feature Selection Technique Using Rank Aggregation
Applied Artificial Intelligence ( IF 2.9 ) Pub Date : 2014-03-14 , DOI: 10.1080/08839514.2014.883903
Chandrima Sarkar ₁ , Sarah Cooley ₂ , Jaideep Srivastava ₁

Affiliation

Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3–4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers.

中文翻译：

使用秩聚合的鲁棒特征选择技术

尽管特征选择是一个成熟的研究领域，但仍然需要开发使分类器更有效的方法。一个重要的挑战是缺乏一种通用的特征选择技术，可以在所有类型的分类器中产生类似的结果。这是因为所有特征选择技术都有各自的统计偏差，而分类器利用数据的不同统计特性进行评估。在许多情况下，这会使研究人员在从大量选择中选择哪种特征选择方法和分类器方面陷入困境。在本文中，我们提出了一种技术，可以聚合各种特征选择方法的共识属性，以开发出更优化的解决方案。我们技术的集成特性使其在各种分类器中更加稳健。换句话说，它可以稳定地在各种分类器中实现相似且理想情况下更高的分类准确度。我们使用称为稳健性指数 (RI) 的度量来量化这种稳健性概念。我们在八个不同维度的数据集上对我们的技术进行了广泛的实证评估，包括心律失常、肺癌、Madelon、mfeat-fourier、互联网广告、白血病-3c、胚胎肿瘤和一个真实世界的数据集，vis。，急性髓细胞白血病（AML）。我们不仅证明了我们的算法更健壮，而且与其他技术相比，

更新日期：2014-03-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11