Wide Graph Neural Networks: Aggregation Provably Leads to Exponentially Trainability Loss,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Wide Graph Neural Networks: Aggregation Provably Leads to Exponentially Trainability Loss
arXiv - CS - Machine Learning Pub Date : 2021-03-03 , DOI: arxiv-2103.03113
Wei Huang, Yayong Li, Weitao Du, Richard Yi Da Xu, Jie Yin, Ling Chen

Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. However, it is well known that deep GCNs will suffer from over-smoothing problem, where node representations tend to be indistinguishable as we stack up more layers. Although extensive research has confirmed this prevailing understanding, few theoretical analyses have been conducted to study the expressivity and trainability of deep GCNs. In this work, we demonstrate these characterizations by studying the Gaussian Process Kernel (GPK) and Graph Neural Tangent Kernel (GNTK) of an infinitely-wide GCN, corresponding to the analysis on expressivity and trainability, respectively. We first prove the expressivity of infinitely-wide GCNs decaying at an exponential rate by applying the mean-field theory on GPK. Besides, we formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate. Additionally, we extend our theoretical framework to analyze residual connection-resemble techniques. We found that these techniques can mildly mitigate exponential decay, but they failed to overcome it fundamentally. Finally, all theoretical results in this work are corroborated experimentally on a variety of graph-structured datasets.

中文翻译：

宽图神经网络：聚合可能导致指数级的可训练性损失

图卷积网络（GCN）及其变体在处理图结构化数据方面取得了巨大的成功。但是，众所周知，深层GCN会遇到过度平滑的问题，随着我们堆叠更多的层，节点表示趋于难以区分。尽管广泛的研究证实了这种普遍的理解，但很少进行理论分析来研究深层GCN的表达性和可训练性。在这项工作中，我们通过研究无限宽的GCN的高斯过程核（GPK）和图神经正切核（GNTK）来证明这些特征，分别对应于对表达性和可训练性的分析。我们首先通过在GPK上应用均场理论证明无限宽的GCN以指数速率衰减的表达。除了，我们在大深度上制定了GNTK的渐近行为，这使我们能够以指数速率揭示宽深GCN下降的可训练性。此外，我们扩展了理论框架以分析残差连接相似技术。我们发现这些技术可以温和地减轻指数衰减，但它们无法从根本上克服它。最后，这项工作的所有理论结果在各种图结构数据集上都得到了实验证实。但是他们没有从根本上克服它。最后，这项工作的所有理论结果在各种图结构数据集上都得到了实验证实。但是他们没有从根本上克服它。最后，这项工作的所有理论结果在各种图结构数据集上都得到了实验证实。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文