当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-label feature ranking with ensemble methods
Machine Learning ( IF 7.5 ) Pub Date : 2020-10-13 , DOI: 10.1007/s10994-020-05908-1
Matej Petković , Sašo Džeroski , Dragi Kocev

In this paper, we propose three ensemble-based feature ranking scores for multi-label classification (MLC), which is a generalisation of multi-class classification where the classes are not mutually exclusive. Each of the scores (Symbolic, Genie3 and Random forest) can be computed from three different ensembles of predictive clustering trees: Bagging, Random forest and Extra trees. We extensively evaluate the proposed scores on 24 benchmark MLC problems, using 15 standard MLC evaluation measures. We determine the ranking quality saturation points in terms of the ensemble sizes, for each ranking-ensemble pair, and show that quality rankings can be computed really efficiently (typically 10 or 50 trees suffice). We also show that the proposed feature rankings are relevant and determine the most appropriate ensemble method for every feature ranking score. We empirically prove that the proposed feature ranking scores outperform current state-of-the-art methods in the quality of the rankings (for the majority of the evaluation measures), and in time efficiency. Finally, we determine the best performing feature ranking scores. Taking into account the quality of the rankings first and—in the case of ties—time efficiency, we identify the Genie3 feature ranking score as the optimal one.

中文翻译:

使用集成方法的多标签特征排序

在本文中,我们为多标签分类 (MLC) 提出了三个基于集成的特征排名分数,这是多类分类的泛化,其中类不是相互排斥的。每个分数(Symbolic、Genie3 和 Random Forest)都可以从三个不同的预测聚类树集合中计算出来:Bagging、随机森林和 Extra 树。我们使用 15 个标准 MLC 评估措施广泛评估了 24 个基准 MLC 问题的建议分数。对于每个排名-集成对,我们根据集成大小确定排名质量饱和点,并表明可以真正有效地计算质量排名(通常 10 或 50 棵树就足够了)。我们还表明所提出的特征排名是相关的,并为每个特征排名分数确定最合适的集成方法。我们凭经验证明,所提出的特征排名分数在排名质量(对于大多数评估措施)和时间效率方面优于当前最先进的方法。最后,我们确定性能最佳的特征排名分数。首先考虑排名的质量和(在平局的情况下)时间效率,我们将 Genie3 特征排名分数确定为最佳分数。
更新日期:2020-10-13
down
wechat
bug