当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
F-Measure Curves: A Tool to Visualize Classifier Performance Under Imbalance
Pattern Recognition ( IF 8 ) Pub Date : 2020-04-01 , DOI: 10.1016/j.patcog.2019.107146
Roghayeh Soleymani , Eric Granger , Giorgio Fumera

Abstract Learning from imbalanced data is a challenging problem in many real-world machine learning applications due in part to the bias of performance in most classification systems. This bias may exist due to three reasons: (1) Classification systems are often optimized and compared using performance measurements that are unsuitable for imbalance problems; (2) most learning algorithms are designed and tested on a fixed imbalance level of data, which may differ from operational scenarios; (3) the preference of correct classification of classes is different from one application to another. This paper investigates specialized performance evaluation metrics and tools for imbalance problem, including scalar metrics that assume a given operating condition (skew level and relative preference of classes), and global evaluation curves or metrics that consider a range of operating conditions. We focus on the case in which the scalar metric F-measure is preferred over other scalar metrics, and propose a new global evaluation space for the F-measure that is analogous to the cost curves for expected cost. In this space, a classifier is represented as a curve that shows its performance over all of its decision thresholds and a range of possible imbalance levels for the desired preference of true positive rate to precision. Curves obtained in the F-measure space are compared to those of existing spaces (ROC, precision-recall and cost) and analogously to cost curves. The proposed F-measure space allows to visualize and compare classifiers’ performance under different operating conditions more easily than in ROC and precision-recall spaces. This space allows us to set the optimal decision threshold of a soft classifier and to select the best classifier among a group. This space also allows to empirically improve the performance obtained with ensemble learning methods specialized for class imbalance, by selecting and combining the base classifiers for ensembles using a modified version of the iterative Boolean combination algorithm that is optimized using the F-measure instead of AUC. Experiments on a real-world dataset for video face recognition show the advantages of evaluating and comparing different classifiers in the F-measure space versus ROC, precision-recall, and cost spaces. In addition, it is shown that the performance evaluated using the F-measure of Bagging ensemble method can improve considerably by using the modified iterative Boolean combination algorithm.

中文翻译:

F-Measure Curves:一种可视化不平衡下分类器性能的工具

摘要 在许多现实世界的机器学习应用中,从不平衡数据中学习是一个具有挑战性的问题,部分原因在于大多数分类系统中的性能偏差。这种偏差可能存在的原因有以下三个:(1)分类系统经常使用不适合不平衡问题的性能测量进行优化和比较;(2) 大多数学习算法都是在固定的不平衡数据水平上设计和测试的,这可能与操作场景不同;(3) 对类别正确分类的偏好因应用而异。本文研究了针对不平衡问题的专门性能评估指标和工具,包括假设给定操作条件(偏斜水平和类的相对偏好)的标量指标,以及考虑一系列操作条件的全局评估曲线或指标。我们关注标量度量 F 度量优于其他标量度量的情况,并为 F 度量提出了一个新的全局评估空间,类似于预期成本的成本曲线。在这个空间中,分类器被表示为一条曲线,该曲线显示了它在所有决策阈值上的性能和一系列可能的不平衡水平,以实现对真阳性率的期望偏好。在 F-measure 空间中获得的曲线与现有空间(ROC、精确召回和成本)的曲线进行比较,并类似于成本曲线。与 ROC 和精确召回空间相比,所提出的 F 度量空间允许更容易地可视化和比较分类器在不同操作条件下的性能。该空间允许我们设置软分类器的最佳决策阈值并在组中选择最佳分类器。该空间还允许通过使用迭代布尔组合算法的修改版本来选择和组合集成的基分类器,从而凭经验提高使用专用于类不平衡的集成学习方法获得的性能,该算法使用 F 度量而不是 AUC 进行优化。在用于视频人脸识别的真实世界数据集上进行的实验表明,在 F 度量空间与 ROC、精确召回和成本空间中评估和比较不同分类器的优势。此外,还表明使用改进的迭代布尔组合算法可以显着提高使用 F-measure of Bagging 集成方法评估的性能。
更新日期:2020-04-01
down
wechat
bug