当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
OLP++: An online local classifier for high dimensional data
Information Fusion ( IF 14.7 ) Pub Date : 2022-09-16 , DOI: 10.1016/j.inffus.2022.09.010
Mariana A. Souza , Robert Sabourin , George D.C. Cavalcanti , Rafael M.O. Cruz

Ensemble diversity is an important characteristic of Multiple Classifier Systems (MCS), which aim at improving the overall performance of a classification system by combining the response of several models. While diversity may be introduced through various manipulations at the data level and the model level, some MCSs incorporate local information in order to increase it and/or take advantage of it, based on the idea that the different classifiers in the ensemble may have expertise in distinct areas of the feature space.1Following a similar reasoning, we introduced in a previous work an ensemble method which produces in test time a few experts in the local region where each given query sample is located. These local experts, which are generated with slightly differing views of the target area, are then used to label the corresponding unknown instance. While the framework was shown to perform well especially over imbalanced problems, the locality definition in the method is based on the nearest neighbors rule and Euclidian distance, as is the case of various local-based ensembles, which may suffer from the effects of the curse of dimensionality over high dimensional problems. Thus, in this work, we propose a local ensemble method in which we leverage the data partitions given by decision trees for locality definition. More specifically, the partitions defined at different levels of the decision path that a given query instance traverses in the tree(s) are used as the regions over which the local experts are produced. By using different node levels from the path, each classifier in the local pool has a moderately distinct view of the target region without resorting to a dissimilarity metric, which might be susceptible to high dimensional spaces. Experimental results over 39 high dimensional problems showed that the proposed approach was significantly superior to our previous, distance-based framework in balanced accuracy rate. Compared to other six local-based ensemble methods, including dynamic selection and weighting schemes, the proposed method achieved competitive results, outperforming the random forest baseline and two state-of-the-art dynamic ensemble selection techniques.



中文翻译:

OLP++:高维数据的在线本地分类器

集成多样性是多分类器系统(MCS)的一个重要特征,旨在通过结合多个模型的响应来提高分类系统的整体性能。虽然可以通过数据级别和模型级别的各种操作来引入多样性,但一些 MCS 会合并本地信息以增加和/或利用它,基于集成中不同分类器可能具有专业知识的想法特征空间的不同区域。1基于类似的推理,我们在之前的工作中引入了一种集成方法,该方法在测试时会在每个给定查询样本所在的本地区域中产生一些专家。这些本地专家生成的目标区域视图略有不同,然后用于标记相应的未知实例。虽然框架被证明在不平衡问题上表现良好,但该方法中的局部性定义基于最近邻规则和欧几里得距离,就像各种基于局部的集合一样,可能会受到诅咒的影响高维问题的维数。因此,在这项工作中,我们提出了一种局部集成方法,在该方法中,我们利用决策树给出的数据分区进行局部定义。进一步来说,在给定查询实例在树中遍历的决策路径的不同级别上定义的分区被用作产生本地专家的区域。通过使用路径中的不同节点级别,本地池中的每个分类器都具有目标区域的适度不同的视图,而无需求助于可能容易受到高维空间影响的差异度量。39 个高维问题的实验结果表明,所提出的方法在平衡准确率方面明显优于我们之前的基于距离的框架。与其他六种基于局部的集成方法(包括动态选择和加权方案)相比,所提出的方法取得了有竞争力的结果,优于随机森林基线和两种最先进的动态集成选择技术。

更新日期:2022-09-16
down
wechat
bug