当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis
Pattern Recognition ( IF 8 ) Pub Date : 2020-06-01 , DOI: 10.1016/j.patcog.2020.107197
Sankha Subhra Mullick , Shounak Datta , Sourish Gunesh Dhekane , Swagatam Das

Abstract Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set. In light of these conditions, under the effect of class imbalance, we theoretically analyze four indices commonly used for evaluating binary classifiers and five popular indices for multi-class classifiers. For indices violating any of the conditions, we also suggest remedial modification and normalization. We further investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes. Simulation studies are performed on high dimensional deep representations of subset of the ImageNet dataset using four state-of-the-art classifiers tailored for handling class imbalance. Finally, based on our theoretical findings and empirical evidence, we recommend the appropriate indices that should be used to evaluate the performance of classifiers in presence of class-imbalance.

中文翻译:

不平衡数据分类性能指标的适用性:分析

摘要 量化分类器在类不平衡下的性能的指标通常会因测试集的构成或特定于类的分类准确度而产生失真,从而导致难以评估分类器的优点。我们确定了性能指标必须满足的两个基本条件,才能分别适应改变每个类的测试实例数量和测试集中的类数量。针对这些情况,在类不平衡的影响下,我们从理论上分析了评估二元分类器常用的四个指标和多类分类器常用的五个指标。对于违反任何条件的指标,我们还建议进行补救修改和标准化。我们进一步研究了索引保留所有类的分类性能信息的能力,即使分类器在某些类上表现出极端的性能。使用为处理类不平衡而定制的四个最先进的分类器对 ImageNet 数据集子集的高维深度表示进行模拟研究。最后,根据我们的理论发现和经验证据,我们推荐了适当的指标,用于评估存在类不平衡时分类器的性能。使用为处理类不平衡而定制的四个最先进的分类器对 ImageNet 数据集子集的高维深度表示进行模拟研究。最后,根据我们的理论发现和经验证据,我们推荐了适当的指标,用于评估存在类不平衡时分类器的性能。使用为处理类不平衡而定制的四个最先进的分类器对 ImageNet 数据集子集的高维深度表示进行模拟研究。最后,基于我们的理论发现和经验证据,我们推荐了适当的指标,用于评估存在类不平衡的分类器的性能。
更新日期:2020-06-01
down
wechat
bug