Optimal approximation rate of ReLU networks in terms of width and depth,Journal de Mathématiques Pures et Appliquées

当前位置： X-MOL 学术 › J. Math. Pures Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal approximation rate of ReLU networks in terms of width and depth
Journal de Mathématiques Pures et Appliquées ( IF 2.3 ) Pub Date : 2021-07-16 , DOI: 10.1016/j.matpur.2021.07.009
Zuowei Shen ₁ , Haizhao Yang ₂ , Shijun Zhang ₁

Affiliation

This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width $O (\max {d ⌊ N^{1 / d} ⌋, N + 2})$ and depth $O (L)$ can approximate a Hölder continuous function on ${[0, 1]}^{d}$ with an approximation rate $O (λ \sqrt{d} {(N^{2} L^{2} \ln N)}^{- α / d})$ , where $α \in (0, 1]$ and $λ > 0$ are Hölder order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function f on ${[0, 1]}^{d}$ , the approximation rate becomes $O (\sqrt{d} ω_{f} ({(N^{2} L^{2} \ln N)}^{- 1 / d}))$ , where $ω_{f} (\cdot)$ is the modulus of continuity. We also extend our analysis to any continuous function f on a bounded set. Particularly, if ReLU networks with depth 31 and width $O (N)$ are used to approximate one-dimensional Lipschitz continuous functions on $[0, 1]$ with a Lipschitz constant $λ > 0$ , the approximation rate in terms of the total number of parameters, $W = O (N^{2})$ , becomes $O (\frac{λ}{W \ln W})$ , which has not been discovered in the literature for fixed-depth ReLU networks.

中文翻译：

ReLU 网络在宽度和深度方面的最佳逼近率

本文重点研究深度前馈神经网络在宽度和深度方面的逼近能力。构造证明，具有宽度的 ReLU 网络 $哦 (最大限度 {d ⌊ N^{1 / d} ⌋, N + 2})$ 和深度 $哦 (升)$ 可以近似一个 Hölder 连续函数 ${[0, 1]}^{d}$ 以近似率 $哦 (λ \sqrt{d} {(N^{2} 升^{2} 输入 N)}^{- α / d})$ ，在哪里 $α \in (0, 1]$ 和 $λ > 0$ 分别是 Hölder 阶和常数。这样的速率在宽度和深度方面分别达到常数时是最佳的，而现有的结果只是在近似速率中没有对数因子的情况下接近最佳。更一般地，对于任意连续函数f on ${[0, 1]}^{d}$ ，近似率变为 $哦 (\sqrt{d} ω_{F} ({(N^{2} 升^{2} 输入 N)}^{- 1 / d}))$ ，在哪里 $ω_{F} (\cdot)$ 是连续性的模数。我们还将我们的分析扩展到有界集合上的任何连续函数f。特别是，如果深度为 31 且宽度为 31 的 ReLU 网络 $哦 (N)$ 用于逼近一维 Lipschitz 连续函数 $[0, 1]$ 与 Lipschitz 常数 $λ > 0$ ，就参数总数而言的近似率， $宽 = 哦 (N^{2})$ ，变成 $哦 (\frac{λ}{宽输入宽})$ ，在固定深度 ReLU 网络的文献中尚未发现。

更新日期：2021-07-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>