Affine symmetries and neural network identifiability,Advances in Mathematics

当前位置： X-MOL 学术 › Adv. Math. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Affine symmetries and neural network identifiability
Advances in Mathematics ( IF 1.5 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.aim.2020.107485
Verner Vlačić , Helmut Bölcskei

We address the following question of neural network identifiability: Suppose we are given a function $f:\mathbb{R}^m\to\mathbb{R}^n$ and a nonlinearity $\rho$. Can we specify the architecture, weights, and biases of all feed-forward neural networks with respect to $\rho$ giving rise to $f$? Existing literature on the subject suggests that the answer should be yes, provided we are only concerned with finding networks that satisfy certain "genericity conditions". Moreover, the identified networks are mutually related by symmetries of the nonlinearity. For instance, the $\tanh$ function is odd, and so flipping the signs of the incoming and outgoing weights of a neuron does not change the output map of the network. The results known hitherto, however, apply either to single-layer networks, or to networks satisfying specific structural assumptions (such as full connectivity), as well as to specific nonlinearities. In an effort to answer the identifiability question in greater generality, we consider arbitrary nonlinearities with potentially complicated affine symmetries, and we show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$. The set obtained in this manner is, in fact, exhaustive (i.e., it contains all networks giving rise to $f$) unless there exists a network $\mathcal{A}$ "with no internal symmetries" giving rise to the identically zero function. This result can thus be interpreted as an analog of the rank-nullity theorem for linear operators. We furthermore exhibit a class of "$\tanh$-type" nonlinearities (including the tanh function itself) for which such a network $\mathcal{A}$ does not exist, thereby solving the identifiability question for these nonlinearities in full generality. Finally, we show that this class contains nonlinearities with arbitrarily complicated symmetries.

中文翻译：

仿射对称性和神经网络可识别性

我们解决了以下关于神经网络可识别性的问题：假设我们有一个函数 $f:\mathbb{R}^m\to\mathbb{R}^n$ 和一个非线性 $\rho$。我们能否指定所有前馈神经网络的体系结构、权重和偏差与 $\rho$ 相关的 $f$？关于该主题的现有文献表明答案应该是肯定的，前提是我们只关心找到满足某些“通用条件”的网络。此外，识别的网络通过非线性的对称性相互关联。例如，$\tanh$ 函数是奇数，因此翻转神经元传入和传出权重的符号不会改变网络的输出图。然而，迄今为止已知的结果适用于单层网络，或满足特定结构假设（例如完全连接）的网络，以及特定的非线性。为了更广泛地回答可识别性问题，我们考虑了具有潜在复杂仿射对称性的任意非线性，并且我们表明对称性可用于找到一组丰富的网络，从而产生相同的函数 $f$。以这种方式获得的集合实际上是详尽的（即，它包含产生 $f$ 的所有网络），除非存在一个“没有内部对称性”的网络 $\mathcal{A}$ 产生相同的零功能。因此，该结果可以解释为线性算子的秩零定理的模拟。我们还展示了一类“$\tanh$-type” 不存在这样的网络 $\mathcal{A}$ 的非线性（包括 tanh 函数本身），从而完全通用地解决了这些非线性的可识别性问题。最后，我们表明该类包含具有任意复杂对称性的非线性。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11