当前位置: X-MOL 学术Constr. Approx. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks
Constructive Approximation ( IF 2.3 ) Pub Date : 2021-06-30 , DOI: 10.1007/s00365-021-09550-5
Massimo Fornasier , Timo Klock , Michael Rauchensteiner

We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type \(f(x)=1^T h(B^T g(A^T x))\) on \(\mathbb R^d\), where \(g=(g_1,\dots , g_{m_0})\), \(h=(h_1,\dots , h_{m_1})\), \(A=(a_1|\dots |a_{m_0}) \in \mathbb R^{d \times m_0}\) and \(B=(b_1|\dots |b_{m_1}) \in \mathbb R^{m_0 \times m_1}\), from a small number of query samples. The solution of the case of two hidden layers presented in this paper is crucial as it can be further generalized to deeper neural networks. We approach the problem by sampling actively finite difference approximations to Hessians of the network. Gathering several approximate Hessians allows reliably to approximate the matrix subspace \(\mathcal W\) spanned by symmetric tensors \(a_1 \otimes a_1,\dots ,a_{m_0}\otimes a_{m_0}\) formed by weights of the first layer together with the entangled symmetric tensors \(v_1 \otimes v_1 ,\dots ,v_{m_1}\otimes v_{m_1}\), formed by suitable combinations of the weights of the first and second layer as \(v_\ell =A G_0 b_\ell /\Vert A G_0 b_\ell \Vert _2\), \(\ell \in [m_1]\), for a diagonal matrix \(G_0\) depending on the activation functions of the first layer. The identification of the 1-rank symmetric tensors within \(\mathcal W\) is then performed by the solution of a robust nonlinear program, maximizing the spectral norm of the competitors constrained over the unit Frobenius sphere. We provide guarantees of stable recovery under a posteriori verifiable conditions. Once the 1-rank symmetric tensors \(\{a_i \otimes a_i, i\in [m_0]\}\cup \{v_\ell \otimes v_\ell , \ell \in [m_1] \}\) are computed, we address their correct attribution to the first or second layer (\(a_i\)’s are attributed to the first layer). The attribution to the layers is currently based on a semi-heuristic reasoning, but it shows clear potential of reliable execution. Having the correct attribution of the \(a_i,v_\ell \) to the respective layers and the consequent de-parametrization of the network, by using a suitably adapted gradient descent iteration, it is possible to estimate, up to intrinsic symmetries, the shifts of the activations functions of the first layer and compute exactly the matrix \(G_0\). Eventually, from the vectors \(v_\ell =A G_0 b_\ell /\Vert A G_0 b_\ell \Vert _2\)’s and \(a_i\)’s one can disentangle the weights \(b_\ell \)’s, by simple algebraic manipulations. Our method of identification of the weights of the network is fully constructive, with quantifiable sample complexity and therefore contributes to dwindle the black box nature of the network training phase. We corroborate our theoretical results by extensive numerical experiments, which confirm the effectiveness and feasibility of the proposed algorithmic pipeline.



中文翻译:

两个隐藏层神经网络的鲁棒性和资源高效识别

我们解决了结构标识和类型的两个完全非线性层的神经网络的统一的近似\(F(X)= 1 ^ T H(乙^ T克(A ^ T X))\)\(\ mathbbř ^d\) , 其中\(g=(g_1,\dots , g_{m_0})\) , \(h=(h_1,\dots , h_{m_1})\) , \(A=(a_1|\)点 |a_{m_0}) \in \mathbb R^{d \times m_0}\)\(B=(b_1|\dots |b_{m_1}) \in \mathbb R^{m_0 \times m_1}\ ),来自少量查询样本。本文提出的两个隐藏层情况的解决方案至关重要,因为它可以进一步推广到更深的神经网络。我们通过对网络的 Hessians 进行主动有限差分近似来解决这个问题。收集几个近似 Hessian 允许可靠地近似由对称张量\(a_1 \otimes a_1,\dots ,a_{m_0}\otimes a_{m_0}\)跨越的矩阵子空间\(\mathcal W\)由第一个的权重形成层连同纠缠对称张量\(v_1 \otimes v_1 ,\dots ,v_{m_1}\otimes v_{m_1}\),由第一层和第二层的权重的适当组合形成,如\(v_\ell = A G_0 b_\ell /\Vert A G_0 b_\ell \Vert _2\) ,\(\ell \in [m_1]\),对于对角矩阵\(G_0\)取决于第一层的激活函数。然后通过鲁棒非线性程序的解来执行\(\mathcal W\)内的 1 秩对称张量的识别,最大化约束在单位 Frobenius 球上的竞争者的谱范数。我们提供在后验可验证条件下稳定恢复的保证。一旦1秩对称张量\(\ {A_I \ otimes A_I,I \在[M_0] \} \杯\ {V_ \ ELL \ otimes V_ \ ELL,\ ELL \在[M_1] \} \)被计算,我们将它们正确归因于第一层或第二层 ( \(a_i\)的归于第一层)。层的归因目前基于半启发式推理,但它显示出可靠执行的明显潜力。将\(a_i,v_\ell \)正确归因于各个层并随后对网络进行去参数化,通过使用适当调整的梯度下降迭代,可以估计到内在对称性,第一层的激活函数的偏移并精确计算矩阵\(G_0\)。最终,从向量\(v_\ell =A G_0 b_\ell /\Vert A G_0 b_\ell \Vert _2\)\(a_i\)可以解开权重\(b_\ell \ )的,通过简单的代数操作。我们识别网络权重的方法是完全建设性的,具有可量化的样本复杂性,因此有助于减少网络训练阶段的黑盒性质。我们通过大量的数值实验证实了我们的理论结果,这证实了所提出的算法流程的有效性和可行性。

更新日期:2021-07-01
down
wechat
bug