LU decomposition and Toeplitz decomposition of a neural network,Applied and Computational Harmonic Analysis

当前位置： X-MOL 学术 › Appl. Comput. Harmon. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LU decomposition and Toeplitz decomposition of a neural network
Applied and Computational Harmonic Analysis ( IF 2.5 ) Pub Date : 2023-10-06 , DOI: 10.1016/j.acha.2023.101601
Yucong Liu , Simiao Jiao , Lek-Heng Lim

Any matrix A has an LU decomposition up to a row or column permutation. Less well-known is the fact that it has a ‘Toeplitz decomposition’ $A = T_{1} T_{2} \dots T_{r}$ where $T_{i}$ 's are Toeplitz matrices. We will prove that any continuous function $f : R^{n} \to R^{m}$ has an approximation to arbitrary accuracy by a neural network that maps $x \in R^{n}$ to $L_{1} σ_{1} U_{1} σ_{2} L_{2} σ_{3} U_{2} \dots L_{r} σ_{2 r - 1} U_{r} x \in R^{m}$ , i.e., where the weight matrices alternate between lower and upper triangular matrices, $σ_{i} (x) ≔ σ (x - b_{i})$ for some bias vector $b_{i}$ , and the activation σ may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., $f \approx T_{1} σ_{1} T_{2} σ_{2} \dots σ_{r - 1} T_{r}$ to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when f is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices dramatically reduces the number of training parameters with almost no noticeable effect on test accuracy.

中文翻译：

神经网络的 LU 分解和 Toeplitz 分解

任何矩阵A都具有直到行或列排列的 LU 分解。不太为人所知的是它具有“托普利茨分解”的事实 $A = {时间}_{1} {时间}_{2} \dots {时间}_{r}$ 在哪里 ${时间}_{我}$ 是托普利茨矩阵。我们将证明任意连续函数 $F ：右^{n} \to 右^{米}$ 通过映射的神经网络具有任意精度的近似值 $X ε 右^{n}$ 到 $L_{1} σ_{1} U_{1} σ_{2} L_{2} σ_{3} U_{2} \dots L_{r} σ_{2 r - 1} U_{r} X ε 右^{米}$ ，即权重矩阵在下三角矩阵和上三角矩阵之间交替， $σ_{我} （ X ） ≔ σ （ X - 乙_{我} ）$ 对于一些偏置向量 $乙_{我}$ ，并且激活σ可以被选择为基本上任何一致连续的非多项式函数。同样的结果也适用于 Toeplitz 矩阵，即 $F \approx {时间}_{1} σ_{1} {时间}_{2} σ_{2} \dots σ_{r - 1} {时间}_{r}$ 任意精度，汉克尔矩阵也是如此。我们的托普利茨结果的一个结果是卷积神经网络的固定宽度通用逼近定理，迄今为止只有任意宽度版本。由于我们的结果特别适用于f是一般神经网络的情况，因此我们可以将它们视为神经网络的 LU 和 Toeplitz 分解。我们的结果的实际意义是，可以大大减少神经网络中权重参数的数量，而不牺牲其通用逼近的能力。我们将在真实数据集上进行几个实验，以表明将这种结构应用于权重矩阵可以显着减少训练参数的数量，而对测试精度几乎没有明显影响。

更新日期：2023-10-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>