当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cost-sensitive design of quadratic discriminant analysis for imbalanced data
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-06-12 , DOI: 10.1016/j.patrec.2021.06.002
Amine Bejaoui , Abla Kammoun , Mohamed Slim Alouini

Learning from imbalanced training data represents a major challenge that has triggered recent interest from both academia and industry. As far as classification is concerned, it has been observed that several algorithms provide low accuracy when designed out of imbalanced data sets, among which regularized quadratic discriminant analysis (R-QDA) is the most illustrative example. Based on recent asymptotic findings, the study in [2] has brought a better understanding of the reasons behind the excessive sensitivity of R-QDA to data imbalance, which allowed for the development of a novel quadratic based classifier that presents higher robustness to such scenarios. However, the selection of the parameters for this classifier relied on the minimization of the overall classification error rate, which is not considered as a relevant performance metric in extremely imbalanced training data. In this work, we follow a multi-model selection approach for the selection of the parameters of the classifier proposed in [2]. Such an approach involves solving a multi-objective optimization problem, but, contrary to related works, we do not resort to evolutionary algorithms to solve this problem but rather to a solely training data dependent technique based on asymptotic approximations for the classification performances. This allows us to transform the multi-objective optimization problem into a scalar optimization problem. Our proposed approach presents the main advantages of being more accurate and less complex, avoiding the need for computationally expensive cross-validation procedures. Its interest goes beyond the quadratic discriminant analysis, paving the way towards a principled method for the design of classification algorithms in imbalanced data scenarios.



中文翻译:

不平衡数据二次判别分析的成本敏感设计

从不平衡的训练数据中学习是一项重大挑战,最近引起了学术界和工业界的兴趣。就分类而言,已经观察到几种算法在设计出不平衡的数据集时精度较低,其中正则化二次判别分析 (R-QDA) 是最能说明问题的例子。基于最近的渐近发现,[2] 中的研究更好地理解了 R-QDA 对数据不平衡过度敏感背后的原因,这使得开发一种新的基于二次的分类器对此类场景具有更高的鲁棒性. 然而,这个分类器的参数选择依赖于整体分类错误率的最小化,在极度不平衡的训练数据中,这不被视为相关的性能指标。在这项工作中,我们采用多模型选择方法来选择 [2] 中提出的分类器的参数。这种方法涉及解决多目标优化问题,但是,与相关工作相反,我们不采用进化算法来解决这个问题,而是采用基于分类性能渐近近似的单独训练数据相关技术。这允许我们将多目标优化问题转化为标量优化问题。我们提出的方法具有更准确和更简单的主要优点,避免了对计算成本高昂的交叉验证程序的需要。

更新日期:2021-06-28
down
wechat
bug