当前位置: X-MOL 学术Found. Comput. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Phase Transitions in Rate Distortion Theory and Deep Learning
Foundations of Computational Mathematics ( IF 2.5 ) Pub Date : 2021-11-16 , DOI: 10.1007/s10208-021-09546-4
Philipp Grohs 1, 2 , Andreas Klotz 1 , Felix Voigtlaender 3
Affiliation  

Rate distortion theory is concerned with optimally encoding signals from a given signal class \(\mathcal {S}\) using a budget of R bits, as \(R \rightarrow \infty \). We say that \(\mathcal {S}\) can be compressed at rate s if we can achieve an error of at most \(\mathcal {O}(R^{-s})\) for encoding the given signal class; the supremal compression rate is denoted by \(s^*(\mathcal {S})\). Given a fixed coding scheme, there usually are some elements of \(\mathcal {S}\) that are compressed at a higher rate than \(s^*(\mathcal {S})\) by the given coding scheme; in this paper, we study the size of this set of signals. We show that for certain “nice” signal classes \(\mathcal {S}\), a phase transition occurs: We construct a probability measure \(\mathbb {P}\) on \(\mathcal {S}\) such that for every coding scheme \(\mathcal {C}\) and any \(s > s^*(\mathcal {S})\), the set of signals encoded with error \(\mathcal {O}(R^{-s})\) by \(\mathcal {C}\) forms a \(\mathbb {P}\)-null-set. In particular, our results apply to all unit balls in Besov and Sobolev spaces that embed compactly into \(L^2 (\varOmega )\) for a bounded Lipschitz domain \(\varOmega \). As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are in fact generically sharp. In addition, we provide quantitative and non-asymptotic bounds on the probability that a random \(f\in \mathcal {S}\) can be encoded to within accuracy \(\varepsilon \) using R bits. This result is subsequently applied to the problem of approximately representing \(f\in \mathcal {S}\) to within accuracy \(\varepsilon \) by a (quantized) neural network with at most W nonzero weights. We show that for any \(s > s^*(\mathcal {S})\) there are constants cC such that, no matter what kind of “learning” procedure is used to produce such a network, the probability of success is bounded from above by \(\min \big \{1, 2^{C\cdot W \lceil \log _2 (1+W) \rceil ^2 - c\cdot \varepsilon ^{-1/s}} \big \}\).



中文翻译:

速率失真理论和深度学习中的相变

率失真理论涉及使用R位预算对来自给定信号类\(\mathcal {S}\) 的信号进行最佳编码,如\(R \rightarrow \infty \)。我们说\(\mathcal {S}\)可以以速率s压缩,如果我们可以实现最多\(\mathcal {O}(R^{-s})\)的误差来编码给定的信号类; 最高压缩率由\(s^*(\mathcal {S})\) 表示。给定一个固定的编码方案,通常\(\mathcal {S}\) 的一些元素以比\(s^*(\mathcal {S})\)更高的速率压缩 通过给定的编码方案;在本文中,我们研究了这组信号的大小。我们表明,对于某些“好的”信号类\(\mathcal {S}\),会发生相变:我们在\(\mathcal {S}\)上构建概率度量\(\mathbb {P }\)这样对于每个编码方案\(\mathcal {C}\)和任何\(s > s^*(\mathcal {S})\),用错误编码的信号集\(\mathcal {O}(R^ {-s})\)\(\mathcal {C}\)形成一个\(\mathbb {P}\) -null-set。特别是,我们的结果适用于 Besov 和 Sobolev 空间中的所有单位球,它们紧密地嵌入到\(L^2 (\varOmega )\)用于有界 Lipschitz 域\(\varOmega \)。作为一个应用,我们展示了几个关于使用深度神经网络的函数逼近的现有锐度结果实际上一般是锐利的。此外,我们提供了随机\(f\in \mathcal {S}\)可以使用R位编码到精度\(\varepsilon \)范围内的概率的定量和非渐近界限。这个结果随后被施加到的近似表示问题\(F \在\ mathcal {S} \)到精度内\(\ varepsilon \)由(量化)的神经网络具有至多W¯¯非零权重。我们证明对于任何\(s > s^*(\mathcal {S})\)都有常数cC使得,无论使用什么样的“学习”过程来产生这样的网络,成功的边界是\(\min \big \{1, 2^{C\cdot W \lceil \log _2 (1+W) \rceil ^2 - c\cdot \varepsilon ^{-1/s} } \big \}\)

更新日期:2021-11-17
down
wechat
bug