当前位置: X-MOL 学术Ann. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”
Annals of Statistics ( IF 3.2 ) Pub Date : 2020-08-01 , DOI: 10.1214/19-aos1910
Behrooz Ghorbani , Song Mei , Theodor Misiakiewicz , Andrea Montanari

We congratulate Johannes Schmidt-Hieber for his elegant and thought-provoking results. His article uses deep-learning-inspired methods in the context of nonparametric regression. SchmidtHieber defines a rich class of composition-based functions G(q,d, t,β,K) and a class of sparse multi-layer neural networks F(L,p, s, F ). He proves that least square estimation over the class of sparse neural networks (with suitably chosen architecture (L,p, s, F )) achieves nearly minimax prediction error over G(q,d, t,β,K). The modeling and analysis in this paper are both elegant and original. They trigger a natural question: how much of the empirical success of deep learning can be understood using this model? As a way to stimulate reflection on this question, we will discuss three challenges: 1. Sparsity and generalization; 2. Curse of dimensionality; 3. Computation. Throughout, we will denote by ε∗ = min0≤i≤q[2β ∗ i /(2β ∗ i + ti)] ∈ (0, 1) the minimax exponent in the class G(q,d, t,β,K). Also, in our discussion we shall focus on multi-layer perceptrons, and in particular we exclude convolutional networks. The latter have entirely different structure, and they do not follow within the scope of the present paper.

中文翻译:

讨论:“使用具有 ReLU 激活函数的深度神经网络的非参数回归”

我们祝贺 Johannes Schmidt-Hieber 的优雅和发人深省的结果。他的文章在非参数回归的背景下使用了受深度学习启发的方法。SchmidtHieber 定义了一类丰富的基于组合的函数 G(q,d, t,β,K) 和一类稀疏多层神经网络 F(L,p, s, F)。He proves that least square estimation over the class of sparse neural networks (with suitably chosen architecture (L,p, s, F )) achieves nearly minimax prediction error over G(q,d, t,β,K). 本文的建模和分析既优雅又新颖。它们引发了一个自然的问题:使用这个模型可以理解深度学习的成功经验有多少?为了激发对这个问题的反思,我们将讨论三个挑战: 1. 稀疏性和泛化;2.维度诅咒;3. 计算。在整个过程中,我们将表示为 ε∗ = min0≤i≤q[2β ∗ i /(2β ∗ i + ti)] ∈ (0, 1) G(q,d, t,β,K) 类中的极小极大指数)。此外,在我们的讨论中,我们将关注多层感知器,特别是我们排除卷积网络。后者结构完全不同,不在本文讨论范围之内。
更新日期:2020-08-01
down
wechat
bug