当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning via Importance Sampling
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2020-12-07 , DOI: 10.1145/3418684
Yugang Ji 1 , Mingyang Yin 2 , Hongxia Yang 2 , Jingren Zhou 2 , Vincent W. Zheng 3 , Chuan Shi 1 , Yuan Fang 4
Affiliation  

In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types of neighborhoods. However, learning on such complex graphs, often with millions or billions of nodes, edges, and various attributes, could suffer from expensive time cost and high memory consumption. In this article, we attempt to accelerate representation learning on large-scale HIGs by adopting the importance sampling of heterogeneous neighborhoods in a batch-wise manner, which naturally fits with most batch-based optimizations. Distinct from traditional homogeneous strategies neglecting semantic types of nodes and edges, to handle the rich heterogeneous semantics within HIGs, we devise both type-dependent and type-fusion samplers where the former respectively samples neighborhoods of each type and the latter jointly samples from candidates of all types. Furthermore, to overcome the imbalance between the down-sampled and the original information, we respectively propose heterogeneous estimators including the self-normalized and the adaptive estimators to improve the robustness of our sampling strategies. Finally, we evaluate the performance of our models for node classification and link prediction on five real-world datasets, respectively. The empirical results demonstrate that our approach performs significantly better than other state-of-the-art alternatives, and is able to reduce the number of edges in computation by up to 93%, the memory cost by up to 92% and the time cost by up to 86%.

中文翻译:

通过重要性采样加速大规模异构交互图嵌入学习

在现实世界的问题中,异构实体往往通过多次交互相互关联,形成异构交互图(HIG)。在对 HIG 建模以处理基本任务时,图神经网络提供了一个有吸引力的机会,可以通过聚合和传播来自不同类型邻域的信息来充分利用异构性和丰富的语义信息。然而,在这种复杂的图上学习,通常具有数百万或数十亿个节点、边和各种属性,可能会受到昂贵的时间成本和高内存消耗的影响。在本文中,我们尝试通过以批处理方式采用异构邻域的重要性采样来加速大规模 HIG 的表示学习,这自然适合大多数基于批处理的优化。与忽略节点和边的语义类型的传统同质策略不同,为了处理 HIG 中丰富的异构语义,我们设计了依赖于类型和类型融合的采样器,前者分别对每种类型的邻域进行采样,后者从候选对象中联合采样所有类型。此外,为了克服下采样和原始信息之间的不平衡,我们分别提出了异构估计器,包括自归一化和自适应估计器,以提高我们采样策略的鲁棒性。最后,我们分别在五个真实数据集上评估我们的模型在节点分类和链接预测方面的性能。实证结果表明,我们的方法明显优于其他最先进的替代方案,
更新日期:2020-12-07
down
wechat
bug