当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient access methods for very large distributed graph databases
Information Sciences Pub Date : 2021-05-26 , DOI: 10.1016/j.ins.2021.05.047
David Luaces , José R.R. Viqueira , José M. Cotos , Julián C. Flores

Subgraph searching is an essential problem in graph databases, but it is also challenging due to the involved subgraph isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using an index to prune out graphs that do not fit the query in a filtering stage, reducing the number of subgraph isomorphism evaluations in a subsequent verification stage. Subgraph searching has to be applied to very large databases (tens of millions of graphs) in real applications such as molecular substructure searching. Previous surveys have identified the FTV solutions GraphGrepSX (GGSX) and CT-Index as the best ones for large databases (thousands of graphs), however they cannot reach reasonable performance on very large ones (tens of millions graphs). This paper proposes a generic approach for the distributed implementation of FTV solutions. Besides, three previous methods that improve the performance of GGSX and CT-Index are adapted to be executed in clusters. The evaluation shows how the achieved solutions provide a great performance improvement (between 70% and 90% of filtering time reduction) in a centralized configuration and how they may be used to achieve efficient subgraph searching over very large databases in cluster configurations.



中文翻译:

超大型分布式图数据库的高效访问方法

子图搜索是图数据库中的一个基本问题,但由于涉及子图同构 NP-Complete 子问题,它也具有挑战性。过滤然后验证(FTV) 方法通过在过滤阶段使用索引修剪不适合查询的图来减轻性能开销,从而减少后续验证阶段中子图同构评估的数量。在分子子结构搜索等实际应用中,子图搜索必须应用于非常大的数据库(数千万个图)。之前的调查已经确定 FTV 解决方案 GraphGrepSX (GGSX) 和 CT-Index 是大型数据库(数千个图)的最佳解决方案,但是它们无法在非常大的数据库(数千万个图)上达到合理的性能。本文提出了一种分布式实现 FTV 解决方案的通用方法。此外,改进 GGSX 和 CT-Index 性能的三种先前方法适用于在集群中执行。

更新日期:2021-06-09
down
wechat
bug