当前位置: X-MOL 学术Found. Comput. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Topological Properties of the Set of Functions Generated by Neural Networks of Fixed Size
Foundations of Computational Mathematics ( IF 3 ) Pub Date : 2020-05-14 , DOI: 10.1007/s10208-020-09461-0
Philipp Petersen , Mones Raslan , Felix Voigtlaender

We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to \(L^p\)-norms, \(0< p < \infty \), for all practically used activation functions, and also not closed with respect to the \(L^\infty \)-norm for all practically used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if \(f_1, f_2\) are two functions realized by neural networks and if \(f_1, f_2\) are close in the sense that \(\Vert f_1 - f_2\Vert _{L^\infty } \le \varepsilon \) for \(\varepsilon > 0\), it is, regardless of the size of \(\varepsilon \), usually not possible to find weights \(w_1, w_2\) close together such that each \(f_i\) is realized by a neural network with weights \(w_i\). Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.



中文翻译:

固定大小的神经网络生成的函数集的拓扑性质

我们分析了可以由固定大小的神经网络实现的功能集的拓扑特性。出人意料的是,该组具有许多不良特性。它是高度非凸的,除了可能有一些特殊的激活函数。此外,对于所有实际使用的激活函数,该集合都不相对于\(L ^ p \)-范数\(0 <p <\ infty \)不闭合,并且也不相对于\(L ^ \ infty \)-规范所有实际使用的激活函数,除了ReLU和参数化ReLU。最后,对于每个实际使用的激活函数,将权重族映射到关联网络所计算的函数的函数并不是反稳定的。换句话说,如果\(f_1,f_2 \)是通过神经网络,并且如果实现两个功能\(F_1,F_2 \)接近在这个意义上\(\ Vert的F_1 - F_2 \ Vert的_ {L ^ \ infty} \文件\ varepsilon \)\(\ varepsilon> 0 \)的情况下,无论\(\ varepsilon \)的大小如何,通常都不可能找到权重\(w_1,w_2 \)得很近,使得每个\(f_i \)由具有权重的神经网络实现\(w_i \)。总体而言,我们的发现确定了深度学习训练过程中潜在问题的原因,例如无法保证收敛,参数爆炸和收敛缓慢。

更新日期:2020-05-14
down
wechat
bug