当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Node classification over bipartite graphs through projection
Machine Learning ( IF 7.5 ) Pub Date : 2020-07-28 , DOI: 10.1007/s10994-020-05898-0
Marija Stankova , Stiene Praet , David Martens , Foster Provost

Many real-world large datasets correspond to bipartite graph data settings—think for example of users rating movies or people visiting locations. Although there has been some prior work on data analysis with such bigraphs, no general network-oriented methodology has been proposed yet to perform node classification. In this paper we propose a three-stage classification framework that effectively deals with the typical very large size of such datasets. The stages are: (1) top node weighting, (2) projection to a weighted unigraph, and (3) application of a relational classifier. This paper has two major contributions. Firstly, this general framework allows us to explore the design space, by applying different choices at the three stages, introducing new alternatives and mixing-and-matching to create new techniques. We present an empirical study of the predictive and run-time performances for different combinations of functions in the three stages over a large collection of bipartite datasets with sizes of up to $$20\,\hbox {million} \times 30\,\hbox {million}$$ 20 million × 30 million nodes. Secondly, thinking of classification on bigraph data in terms of the three-stage framework opens up the design space of possible solutions, where existing and novel functions can be mixed and matched, and tailored to the problem at hand. Indeed, in this work a novel, fast, accurate and comprehensible method emerges, called the SW-transformation, as one of the best-performing combinations in the empirical study.

中文翻译:

通过投影对二部图进行节点分类

许多现实世界的大型数据集对应于二部图数据设置——例如用户评价电影或访问地点的人。尽管在使用此类双图进行数据分析方面已有一些先前的工作,但尚未提出通用的面向网络的方法来执行节点分类。在本文中,我们提出了一个三阶段分类框架,可以有效地处理此类数据集的典型超大尺寸。这些阶段是:(1)顶部节点加权,(2)投影到加权单图,以及(3)关系分类器的应用。本文有两个主要贡献。首先,这个通用框架允许我们探索设计空间,通过在三个阶段应用不同的选择,引入新的替代方案和混合搭配来创造新技术。我们对大小高达 $20\,\hbox {million} \times 30\,\hbox 的大量二分数据集的三个阶段中不同函数组合的预测和运行时性能进行了实证研究{百万}$2000 万 × 3000 万个节点。其次,从三阶段框架的角度考虑对双图数据进行分类,开辟了可能解决方案的设计空间,现有功能和新功能可以混合搭配,并针对手头的问题量身定制。事实上,在这项工作中,出现了一种新颖、快速、准确且易于理解的方法,称为 SW 转换,作为实证研究中表现最佳的组合之一。
更新日期:2020-07-28
down
wechat
bug