当前位置: X-MOL 学术Neural Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Shapley Homology: Topological Analysis of Sample Influence for Neural Networks
Neural Computation ( IF 2.9 ) Pub Date : 2020-07-01 , DOI: 10.1162/neco_a_01289
Kaixuan Zhang 1 , Qinglong Wang 2 , Xue Liu 2 , C Lee Giles 1
Affiliation  

Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (i.i.d.). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. Our proposed framework consists of two main parts: homology analysis, where we compute the Betti number of the target topological space, and Shapley value calculation, where we decompose the topological features of a complex built from data points to individual points. By interpreting the influence as a probability measure, we further define an entropy that reflects the complexity of the data manifold. Furthermore, we provide a preliminary discussion of the connection of the Shapley homology to the Vapnik-Chervonenkis dimension. Empirical studies show that when the zero-dimensional Shapley homology is used on neighboring graphs, samples with higher influence scores have a greater impact on the accuracy of neural networks that determine graph connectivity and on several regular grammars whose higher entropy values imply greater difficulty in being learned.

中文翻译:

Shapley 同源性:神经网络样本影响的拓扑分析

为训练机器学习模型收集的数据样本通常被假定为独立同分布 (iid)。最近的研究表明,这种假设可能有问题,因为它简化了结构化数据的多样性。这激发了不同的研究领域,如数据中毒、模型改进和机器学习模型的解释。在这项工作中,我们研究了样本对确定其潜在流形的内在拓扑特征的影响。我们提出了 Shapley 同源性框架,它为单纯复形的同源性样本的影响提供了定量度量。我们提出的框架由两个主要部分组成:同源分析,我们计算目标拓扑空间的 Betti 数,以及 Shapley 值计算,我们分解从数据点到单个点构建的复合体的拓扑特征。通过将影响解释为概率度量,我们进一步定义了反映数据流形复杂性的熵。此外,我们初步讨论了 Shapley 同源性与 Vapnik-Chervonenkis 维的联系。实证研究表明,当在相邻图上使用零维 Shapley 同源性时,具有较高影响分数的样本对确定图连通性的神经网络的准确性和一些正则语法的影响更大,其较高的熵值意味着更大的难度学到了。我们进一步定义了一个反映数据流形复杂性的熵。此外,我们初步讨论了 Shapley 同源性与 Vapnik-Chervonenkis 维的联系。实证研究表明,当在相邻图上使用零维 Shapley 同源性时,具有较高影响分数的样本对确定图连通性的神经网络的准确性和一些正则语法的影响更大,其较高的熵值意味着更大的难度学到了。我们进一步定义了一个反映数据流形复杂性的熵。此外,我们初步讨论了 Shapley 同源性与 Vapnik-Chervonenkis 维的联系。实证研究表明,当在相邻图上使用零维 Shapley 同源性时,具有较高影响分数的样本对确定图连通性的神经网络的准确性和一些正则语法的影响更大,其较高的熵值意味着更大的难度学到了。
更新日期:2020-07-01
down
wechat
bug