当前位置: X-MOL 学术Data Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Top-k user-specified preferred answers in massive graph databases
Data & Knowledge Engineering ( IF 2.7 ) Pub Date : 2020-02-14 , DOI: 10.1016/j.datak.2020.101798
Noseong Park , Andrea Pugliese , Edoardo Serra , V.S. Subrahmanian

There are numerous applications where users wish to identify subsets of vertices in a social network or graph database that are of interest to them. They may specify sets of patterns and vertex properties, and each of these confers a score to a subgraph. The users want to find the subgraphs with top-k highest scores. Examples in the real world where such subgraphs involve custom scoring methods include: techniques to identify sets of coordinated influence bots on Twitter, methods to identify suspicious subgraphs of nodes involved in nuclear proliferation networks, and sets of sockpuppet accounts seeking to illicitly influence star ratings on e-commerce platforms. All of these types of applications have numerous custom scoring methods. This motivates the concept of Scoring Queries presented in this paper — unlike past work, an important aspect of scoring queries is that the users get to choose the scoring mechanism, not the system. We present the Advanced top-k (ATK) algorithm and show that it intelligently leverages graph indexes from the past but also presents novel pruning opportunities. We present an implementation of ATK showing that it beats out a baseline algorithm that builds on advanced subgraph matching methods with multiple graph database backends including Jena and GraphDB. We show that ATK scales well on real world graph databases from YouTube, Flickr, IMDb, and CiteSeerX.



中文翻译:

最佳-ķ 用户在海图数据库中指定的首选答案

在许多应用中,用户希望在社交网络或图形数据库中识别他们感兴趣的顶点子集。他们可以指定一组模式和顶点属性,并且每组都将一个分数赋予一个子图。用户想找到带有顶部-ķ 最高分。在现实世界中,此类子图涉及自定义评分方法的示例包括:识别Twitter上协作影响机器人集的技术,识别参与核扩散网络的节点的可疑子图的方法以及试图非法影响其星级评定的sets帐户集。电子商务平台。所有这些类型的应用程序都有许多自定义评分方法。这激发了本文提出的评分查询的概念-与过去的工作不同,评分查询的一个重要方面是用户可以选择评分机制,而不是系统。我们展示高级ķ (ATK)算法,并表明它可以智能地利用过去的图形索引,但同时也提供了新颖的修剪机会。我们提供了ATK的实现,该实现 显示了它击败了一种基线算法,该基线算法基于具有多个图数据库后端(包括Jena 和GraphDB)的高级子图匹配方法而构建。我们展示了ATK 在来自YouTubeFlickrIMDbCiteSeerX的真实世界图形数据库上的缩放效果很好。

更新日期:2020-02-14
down
wechat
bug