Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth,Neural Computation

当前位置： X-MOL 学术 › Neural Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth
Neural Computation ( IF 2.7 ) Pub Date : 2021-01-29 , DOI: 10.1162/neco_a_01364
Zuowei Shen ₁ , Haizhao Yang ₁ , Shijun Zhang ₁

Affiliation

A new network with super-approximation power is introduced. This network is built with Floor ( $⌊ x ⌋$ ) or ReLU ( $max {0, x}$ ) activation function in each neuron; hence, we call such networks Floor-ReLU networks. For any hyperparameters $N \in N^{+}$ and $L \in N^{+}$ , we show that Floor-ReLU networks with width $max {d, 5 N + 13}$ and depth $64 d L + 3$ can uniformly approximate a Hölder function $f$ on ${[0, 1]}^{d}$ with an approximation error $3 λ d^{α / 2} N^{- α \sqrt{L}}$ , where $α \in (0, 1]$ and $λ$ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function $f$ on ${[0, 1]}^{d}$ with a modulus of continuity $ω_{f} (\cdot)$ , the constructive approximation rate is $ω_{f} (\sqrt{d} N^{- \sqrt{L}}) + 2 ω_{f} (\sqrt{d}) N^{- \sqrt{L}}$ . As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ω_{f} (r)$ as $r \to 0$ is moderate (e.g., $ω_{f} (r) ≲ r^{α}$ for Hölder continuous functions), since the major term to be considered in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$ within the modulus of continuity.

中文翻译：

具有近似误差为宽度与深度平方根次方的倒数的深度网络

介绍了一种具有超逼近能力的新网络。这个网络是用 Floor ( $⌊ X ⌋$ ) 或 ReLU ( $最大限度 {0, X}$ ) 每个神经元的激活函数；因此，我们称这种网络为 Floor-ReLU 网络。对于任何超参数 $N \in N^{+}$ 和 $升 \in N^{+}$ ，我们展示了具有宽度的 Floor-ReLU 网络 $最大限度 {d, 5 N + 13}$ 和深度 $64 d 升 + 3$ 可以一致地逼近一个 Hölder 函数 $F$ 在 ${[0, 1]}^{d}$ 有近似误差 $3 λ d^{α / 2} N^{—— α \sqrt{升}}$ ，在哪里 $α \in (0, 1]$ 和 $λ$ 分别是 Hölder 阶和常数。更一般地用于任意连续函数 $F$ 在 ${[0, 1]}^{d}$ 具有连续性模数 $ω_{F} (\cdot)$ ，建设性逼近率为 $ω_{F} (\sqrt{d} N^{—— \sqrt{升}}) + 2 ω_{F} (\sqrt{d}) N^{—— \sqrt{升}}$ . 因此，当 $ω_{F} (r)$ 作为 $r \to 0$ 是中等的（例如， $ω_{F} (r) ≲ r^{α}$ 对于 Hölder 连续函数），因为在我们的近似率中要考虑的主要项本质上是 $\sqrt{d}$ 次函数 $N$ 和 $升$ 独立于 $d$ 在连续性模数内。

更新日期：2021-01-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11