当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting
arXiv - CS - Machine Learning Pub Date : 2020-01-19 , DOI: arxiv-2001.06892
Tianyang Hu, Zuofeng Shang, Guang Cheng

Classifiers built with neural networks handle large-scale high dimensional data, such as facial images from computer vision, extremely well while traditional statistical methods often fail miserably. In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk. In particular, a teacher-student framework is proposed that assumes the Bayes classifier to be expressed as ReLU neural networks. In this setup, we obtain a sharp rate of convergence, i.e., $\tilde{O}_d(n^{-2/3})$, for classifiers trained using either 0-1 loss or hinge loss. This rate can be further improved to $\tilde{O}_d(n^{-1})$ when the data distribution is separable. Here, $n$ denotes the sample size. An interesting observation is that the data dimension only contributes to the $\log(n)$ term in the above rates. This may provide one theoretical explanation for the empirical successes of deep neural networks in high dimensional classification, particularly for structured data.

中文翻译:

师生环境下深度神经网络分类器的收敛速度

用神经网络构建的分类器处理大规模高维数据,例如来自计算机视觉的面部图像,非常好,而传统的统计方法往往会失败。在本文中,我们试图通过推导过度风险的收敛速度来理解高维分类中的这种经验成功。特别是,提出了一种教师-学生框架,假设贝叶斯分类器表示为 ReLU 神经网络。在此设置中,对于使用 0-1 损失或铰链损失训练的分类器,我们获得了极快的收敛速度,即 $\tilde{O}_d(n^{-2/3})$。当数据分布可分离时,这个比率可以进一步提高到 $\tilde{O}_d(n^{-1})$。这里,$n$ 表示样本大小。一个有趣的观察是数据维度仅对上述比率中的 $\log(n)$ 项有贡献。这可能为深度神经网络在高维分类中的经验成功提供一种理论解释,特别是对于结构化数据。
更新日期:2020-02-04
down
wechat
bug