当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Rates of Convergence From Surrogate Risk Minimizers to the Bayes Optimal Classifier
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-04-21 , DOI: 10.1109/tnnls.2021.3071370
Jingwei Zhang 1 , Tongliang Liu 2 , Dacheng Tao 3
Affiliation  

In classification, the use of 0–1 loss is preferable since the minimizer of 0–1 risk leads to the Bayes optimal classifier. However, due to the nonconvexity of 0–1 loss, this optimization problem is NP-hard. Therefore, many convex surrogate loss functions have been adopted. Previous works have shown that if a Bayes-risk consistent loss function is used as a surrogate, the minimizer of the empirical surrogate risk can achieve the Bayes optimal classifier as the sample size tends to infinity. Nevertheless, the comparison of convergence rates of minimizers of different empirical surrogate risks to the Bayes optimal classifier has rarely been studied. Which characterization of the surrogate loss determines its convergence rate to the Bayes optimal classifier? Can we modify the loss function to achieve a faster convergence rate? In this article, we study the convergence rates of empirical surrogate minimizers to the Bayes optimal classifier. Specifically, we introduce the notions of consistency intensity and conductivity to characterize a surrogate loss function and exploit this notion to obtain the rate of convergence from an empirical surrogate risk minimizer to the Bayes optimal classifier, enabling fair comparisons of the excess risks of different surrogate risk minimizers. The main result of this article has practical implications including: 1) showing that hinge loss (SVM) is superior to logistic loss (Logistic regression) and exponential loss (Adaboost) in the sense that its empirical minimizer converges faster to the Bayes optimal classifier and 2) guiding the design of new loss functions to speed up the convergence rate to the Bayes optimal classifier with a data-dependent loss correction method inspired by our theorems.

中文翻译:

关于从替代风险最小化器到贝叶斯最优分类器的收敛率

在分类中,最好使用 0-1 损失,因为 0-1 风险的最小值导致贝叶斯最优分类器。然而,由于 0-1 损失的非凸性,这个优化问题是 NP-hard。因此,许多凸代理损失函数被采用。以前的工作表明,如果使用贝叶斯风险一致损失函数作为代理,随着样本量趋于无穷大,经验代理风险的最小值可以实现贝叶斯最优分类器。尽管如此,不同经验替代风险的最小化器与贝叶斯最优分类器的收敛速度的比较却很少被研究。代理损失的哪个特征决定了它对贝叶斯最优分类器的收敛速度?我们可以修改损失函数以获得更快的收敛速度吗?在本文中,我们研究了经验代理最小化器对贝叶斯最优分类器的收敛速度。具体来说,我们引入了一致性强度和传导率的概念来表征替代损失函数,并利用这一概念获得从经验替代风险最小化器到贝叶斯最优分类器的收敛速度,从而能够公平比较不同替代风险的超额风险极简主义者。本文的主要结果具有实际意义,包括:
更新日期:2021-04-21
down
wechat
bug