当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning subtree pattern importance for Weisfeiler-Lehman based graph kernels
Machine Learning ( IF 7.5 ) Pub Date : 2021-06-13 , DOI: 10.1007/s10994-021-05991-y
Dai Hai Nguyen , Canh Hao Nguyen , Hiroshi Mamitsuka

Graph is an usual representation of relational data, which are ubiquitous in many domains such as molecules, biological and social networks. A popular approach to learning with graph structured data is to make use of graph kernels, which measure the similarity between graphs and are plugged into a kernel machine such as a support vector machine. Weisfeiler-Lehman (WL) based graph kernels, which employ WL labeling scheme to extract subtree patterns and perform node embedding, are demonstrated to achieve great performance while being efficiently computable. However, one of the main drawbacks of a general kernel is the decoupling of kernel construction and learning process. For molecular graphs, usual kernels such as WL subtree, based on substructures of the molecules, consider all available substructures having the same importance, which might not be suitable in practice. In this paper, we propose a method to learn the weights of subtree patterns in the framework of WWL kernels, the state of the art method for graph classification task (Togninalli et al., in: Advances in Neural Information Processing Systems, pp. 6439–6449, 2019). To overcome the computational issue on large scale data sets, we present an efficient learning algorithm and also derive a generalization gap bound to show its convergence. Finally, through experiments on synthetic and real-world data sets, we demonstrate the effectiveness of our proposed method for learning the weights of subtree patterns.



中文翻译:

学习基于 Weisfeiler-Lehman 的图内核的子树模式重要性

图是关系数据的常用表示,它在许多领域中无处不在,例如分子、生物和社交网络。使用图结构化数据进行学习的一种流行方法是利用图内核,它测量图之间的相似性并插入到内核机器中,例如支持向量机。基于 Weisfeiler-Lehman (WL) 的图内核采用 WL 标记方案来提取子树模式并执行节点嵌入,被证明可以在高效计算的同时实现出色的性能。然而,通用内核的主要缺点之一是内核构建和学习过程的解耦。对于分子图,通常的内核,例如 WL 子树,基于分子的子结构,考虑所有可用的具有相同重要性的子结构,这在实践中可能不适合。在本文中,我们提出了一种在 WWL 内核框架中学习子树模式权重的方法,这是图分类任务的最新方法(Togninalli 等人,在:神经信息处理系统的进展,第 6439 页–6449, 2019)。为了克服大规模数据集上的计算问题,我们提出了一种有效的学习算法,并推导出一个泛化差距,以显示其收敛性。最后,通过对合成和真实世界数据集的实验,我们证明了我们提出的学习子树模式权重的方法的有效性。在:神经信息处理系统的进展,第 6439-6449 页,2019 年)。为了克服大规模数据集上的计算问题,我们提出了一种有效的学习算法,并推导出一个泛化差距,以显示其收敛性。最后,通过对合成和真实世界数据集的实验,我们证明了我们提出的学习子树模式权重的方法的有效性。在:神经信息处理系统的进展,第 6439-6449 页,2019 年)。为了克服大规模数据集上的计算问题,我们提出了一种有效的学习算法,并推导出一个泛化差距,以显示其收敛性。最后,通过对合成和真实世界数据集的实验,我们证明了我们提出的学习子树模式权重的方法的有效性。

更新日期:2021-06-14
down
wechat
bug