当前位置: X-MOL 学术arXiv.cs.SI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Large-Scale Database for Graph Representation Learning
arXiv - CS - Social and Information Networks Pub Date : 2020-11-16 , DOI: arxiv-2011.07682
Scott Freitas, Yuxiao Dong, Joshua Neil, Duen Horng Chau

With the rapid emergence of graph representation learning, the construction of new large-scale datasets are necessary to distinguish model capabilities and accurately assess the strengths and weaknesses of each technique. By carefully analyzing existing graph databases, we identify 3 critical components important for advancing the field of graph representation learning: (1) large graphs, (2) many graphs, and (3) class diversity. To date, no single graph database offers all of these desired properties. We introduce MalNet, the largest public graph database ever constructed, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families. Compared to the popular REDDIT-12K database, MalNet offers 105x more graphs, 44x larger graphs on average, and 63x the classes. We provide a detailed analysis of MalNet, discussing its properties and provenance. The unprecedented scale and diversity of MalNet offers exciting opportunities to advance the frontiers of graph representation learning---enabling new discoveries and research into imbalanced classification, explainability and the impact of class hardness. The database is publically available at www.mal-net.org.

中文翻译:

图表示学习的大规模数据库

随着图表示学习的快速出现,需要构建新的大规模数据集来区分模型能力并准确评估每种技术的优缺点。通过仔细分析现有的图数据库,我们确定了对推进图表示学习领域很重要的 3 个关键组件:(1) 大图,(2) 许多图,以及 (3) 类别多样性。迄今为止,没有一个图形数据库提供所有这些所需的属性。我们介绍了 MalNet,这是有史以来构建的最大的公共图数据库,代表了软件函数调用图的大规模本体。MalNet 包含超过 120 万张图,每个图平均超过 17k 个节点和 39k 条边,跨越 47 种类型和 696 个家族的层次结构。与流行的REDDIT-12K数据库相比,MalNet 提供 105 倍以上的图,平均 44 倍大的图,以及 63 倍的类别。我们提供了对 MalNet 的详细分析,讨论了它的属性和出处。MalNet 前所未有的规模和多样性为推进图表示学习的前沿提供了令人兴奋的机会——使新的发现和研究能够对不平衡的分类、可解释性和类别硬度的影响进行研究。该数据库可在 www.mal-net.org 上公开获得。可解释性和类硬度的影响。该数据库可在 www.mal-net.org 上公开获得。可解释性和类硬度的影响。该数据库可在 www.mal-net.org 上公开获得。
更新日期:2020-11-17
down
wechat
bug