Hybrid search plan generation for generalized graph pattern matching

https://doi.org/10.1016/j.jlamp.2020.100563Get rights and content

Highlights

  • Constraints in a graph query are represented uniformly.

  • Static information allows considering the graph query's structure.

  • Dynamic information allows tailoring to host graph heterogeneities.

  • Filtering effects of constraint checks are considered.

Abstract

In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of interconnected data. To extract useful information from the growing network structures, efficient querying techniques are required.

In this paper, we propose an approach for graph pattern matching that allows a uniform handling of arbitrary constraints over the query vertices. Our technique builds on a previously introduced matching algorithm, which takes concrete host graph information into account to dynamically adapt the employed search plan during query execution. The dynamic algorithm is combined with an existing static approach for search plan generation, resulting in a hybrid technique which we further extend by a more sophisticated handling of filtering effects caused by constraint checks. We evaluate the presented concepts empirically based on an implementation for our graph pattern matching tool, the Story Diagram Interpreter, with queries and data provided by the LDBC Social Network Benchmark. Our results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.

Introduction

In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of information [1]. The considered graphs frequently exhibit an inhomogeneous structure, which in the case of social networks can be caused by the diverse behaviour of different users, including extreme outliers such as celebrities. In order to extract useful information from the growing, heterogeneous network structures, efficient querying techniques are required.

In this paper, which is an extension of [2], we focus on queries without nesting or paths of varying length, which corresponds to the problem of graph pattern matching. Existing solutions usually work by iteratively mapping elements from a query specification to elements in a host graph according to a search plan. Since the order in which the individual elements are mapped has a substantial impact on performance, many solutions employ sophisticated strategies for determining good search plans.

The majority of these techniques only considers structural information, that is, typing information and edges between nodes in the graph like relationships between persons in a social network, for guiding the matching process. However, in many realistic application scenarios, nonstructural information also plays an important role. This includes attributes of nodes like the age of a person in the network as well as external data structures such as indices, which are particularly relevant in the context of graph databases or the evaluation of decomposed queries [3]. Hence, a tighter integration of constraints specified over such nonstructural information into the matching process is desirable.

We therefore introduce a unified notion of constraints in a graph query. We then propose a general matching algorithm that is based on an existing dynamic technique [4], which allows the generation of a search plan on the fly as a query is being executed. On the one hand, the dynamic technique allows tailoring the search to heterogeneities in the host graph that cannot be handled by a static search plan. On the other hand, this approach has the drawback of not being able to consider the overall structure of the query, which can lead to shortsighted decisions during the matching process. To address this problem, we combine our adapted approach with a static but model-sensitive technique for search plan generation [5]. In addition to an analysis of its worst-case runtime complexity, the resulting hybrid approach is evaluated empirically using queries and datasets from the LDBC Social Network Benchmark [6]. Our analytical results confirm that the basic version of the hybrid technique has the same worst-case complexity as the considered static approach. However, our empirical results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.

Compared to [2], we have extended this paper by an analysis of the complexity of our algorithm in conjunction with an analytical evaluation of different strategies for search plan generation, facilitating a comparison to known subgraph isomorphism algorithms. Furthermore, we introduce two new variants of our hybrid approach for search plan generation. We consider these variants in an extended empirical evaluation, which now also includes a completely different static technique for search plan generation introduced in [7] and flat versions of the previously decomposed benchmark queries.

The remainder of the paper is structured as follows: Section 2 briefly introduces the basic notion of graphs, graph morphisms, and graph queries as used in this paper. We then present our generalised algorithm for graph pattern matching in Section 3 and give an overview of an existing static and a dynamic technique for search plan generation. In Section 4, we first integrate these strategies into a hybrid solution. Subsequently, several extensions and modifications of the resulting hybrid approach, such as a more sophisticated consideration of constraint checks during search plan generation, are outlined in Section 5. The developed concepts are evaluated analytically and empirically in Section 6, using a benchmark from the domain of social networks. Section 7 discusses related work and Section 8 concludes the paper.

Section snippets

Prerequisites

We briefly reintroduce the notion of graphs and graph morphisms [8]. A graph G=(GV,GE,sG,tG) consists of a set of vertices GV, a set of edges GE, a source function sG:GEGV and a target function tG:GEGV. Given two graphs G=(GV,GE,sG,tG) and H=(HV,HE,sH,tH), a graph morphism f:GH is a pair of mappings fV:GVHV and fE:GEHE such that fVsG=sHfE and fVtG=tHfE. In the remainder of this paper, we will refer to fV and fE as vertex morphism and edge morphism, respectively. If fV and fE are

Search model and search plan generation

In order to represent all possible kinds of constraints specified in a graph query in a uniform manner, we propose a Search Model for graph queries. As displayed in a metamodel in Fig. 2, a Search Model consists of three types of elements, some of which are augmented with states to encode the state of a query execution.

Pattern Nodes represent vertices in the query graph Q and can either be in state BOUND, indicating that a mapping for the Pattern Node has already been determined, or UNBOUND

Hybrid search plan generation

To leverage the more accurate information available during the execution of a graph query while still considering the overall structure of the query, we propose a hybrid strategy for search plan generation. The combined approach can easily be integrated with the Search-Model-based matching algorithm presented in Section 3.1 in the form of a strategy for Pattern Constraint selection and is based on an adapted cost function for Matching Actions. The adapted cost function no longer only considers

Variants of hybrid search plan generation

In the following, we explore different directions for enhancing the hybrid approach to search plan generation introduced in Section 4.

Evaluation

In this section, we evaluate our approach for graph pattern matching based on a Search Model and hybrid search plan generation. In addition to the strategies introduced in the previous sections, we also consider a completely different static approach for search plan generation, GrGen, which was developed by Geiß et al. [7]. It is based on finding a minimum spanning tree of a so called plan graph for a given graph query. While the preliminary computations of GrGen require much less effort

Related work

Subgraph isomorphism algorithms  Due to the importance of the subgraph isomorphism problem in multiple domains, several algorithms for finding subgraph isomorphisms have been proposed over the years. These algorithms are usually designed for labelled graphs, which are similar to typed graphs introduces in Section 2 but do not allow parallel edges and do not constrain the types of an edge's source and target based on the edge type.

Ullman's algorithm [17] is an early but popular solution, which

Conclusion

In this paper, we developed an approach for graph pattern matching based on a Search Model representation of a graph query and a generalised version of the algorithm from [4]. We then integrated an existing static technique [5] with our dynamic algorithm and extended the resulting hybrid strategy by a more sophisticated handling of filtering effects. In addition, we considered a variant that takes a more pessimistic approach to computing required static cost estimates and an extension which

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (25)

  • R.M. Haralick et al.

    Increasing tree search efficiency for constraint satisfaction problems

    Artif. Intell.

    (1980)
  • R. Angles

    A comparison of current graph database models

  • M. Barkowsky et al.

    Hybrid search plan generation for generalized graph pattern matching

  • T. Beyhl et al.

    On the operationalization of graph queries with generalized discrimination networks

  • H. Giese et al.

    Improved flexibility and scalability by interpreting story diagrams

    Electron. Commun. EASST

    (2009)
  • G. Varró et al.

    An algorithm for generating model-sensitive search plans for EMF models

  • O. Erling et al.

    The LDBC social network benchmark: interactive workload

  • R. Geiß et al.

    GrGen: a fast SPO-based graph rewriting tool

  • H. Ehrig et al.

    Fundamentals of Algebraic Graph Transformation

    (2006)
  • F. Bi et al.

    Efficient subgraph matching by postponing Cartesian products

  • T. Arendt et al.

    Henshin: advanced concepts and tools for in-place EMF model transformations

  • P. Foggia et al.

    A performance comparison of five algorithms for graph isomorphism

  • Cited by (4)

    Funding: This work was supported by the Deutsche Forschungsgemeinschaft (grant number GI 765/8-1).

    View full text