当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distance metric learning for graph structured data
Machine Learning ( IF 4.3 ) Pub Date : 2021-06-16 , DOI: 10.1007/s10994-021-06009-3
Tomoki Yoshida , Ichiro Takeuchi , Masayuki Karasuyama

Graphs are versatile tools for representing structured data. As a result, a variety of machine learning methods have been studied for graph data analysis. Although many such learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for graphs remains a controversial issue. Hence, we propose a supervised distance metric learning method for the graph classification problem. Our method, named interpretable graph metric learning (IGML), learns discriminative metrics in a subgraph-based feature space, which has a strong graph representation capability. By introducing a sparsity-inducing penalty on the weight of each subgraph, IGML can identify a small number of important subgraphs that can provide insight into the given classification task. Because our formulation has a large number of optimization variables, an efficient algorithm that uses pruning techniques based on safe screening and working set selection methods is also proposed. An important property of IGML is that solution optimality is guaranteed because the problem is formulated as a convex problem and our pruning strategies only discard unnecessary subgraphs. Furthermore, we show that IGML is also applicable to other structured data such as itemset and sequence data, and that it can incorporate vertex-label similarity by using a transportation-based subgraph feature. We empirically evaluate the computational efficiency and classification performance of IGML on several benchmark datasets and provide some illustrative examples of how IGML identifies important subgraphs from a given graph dataset.



中文翻译:

图结构数据的距离度量学习

图是表示结构化数据的通用工具。因此,已经研究了各种机器学习方法用于图数据分析。尽管许多此类学习方法依赖于对输入图之间差异的测量,但为图定义合适的距离度量仍然是一个有争议的问题。因此,我们为图分类问题提出了一种有监督的距离度量学习方法。我们的方法,称为可解释图度量学习(IGML),在基于子图的特征空间中学习判别性度量,具有很强的图表示能力。通过对每个子图的权重引入稀疏诱导惩罚,IGML 可以识别少量重要的子图,这些子图可以提供对给定分类任务的洞察力。由于我们的公式具有大量优化变量,因此是一种使用基于安全筛选工作集选择的剪枝技术的高效算法还提出了方法。IGML 的一个重要特性是可以保证解决方案的最优性,因为问题被表述为凸问题,并且我们的修剪策略仅丢弃不必要的子图。此外,我们表明 IGML 也适用于其他结构化数据,例如项集和序列数据,并且它可以通过使用基于运输的子图特征来合并顶点标签相似性。我们凭经验评估了 IGML 在几个基准数据集上的计算效率和分类性能,并提供了一些说明性示例,说明 IGML 如何从给定的图形数据集中识别重要的子图。

更新日期:2021-06-17
down
wechat
bug