Hybrid search plan generation for generalized graph pattern matching☆
Introduction
In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of information [1]. The considered graphs frequently exhibit an inhomogeneous structure, which in the case of social networks can be caused by the diverse behaviour of different users, including extreme outliers such as celebrities. In order to extract useful information from the growing, heterogeneous network structures, efficient querying techniques are required.
In this paper, which is an extension of [2], we focus on queries without nesting or paths of varying length, which corresponds to the problem of graph pattern matching. Existing solutions usually work by iteratively mapping elements from a query specification to elements in a host graph according to a search plan. Since the order in which the individual elements are mapped has a substantial impact on performance, many solutions employ sophisticated strategies for determining good search plans.
The majority of these techniques only considers structural information, that is, typing information and edges between nodes in the graph like relationships between persons in a social network, for guiding the matching process. However, in many realistic application scenarios, nonstructural information also plays an important role. This includes attributes of nodes like the age of a person in the network as well as external data structures such as indices, which are particularly relevant in the context of graph databases or the evaluation of decomposed queries [3]. Hence, a tighter integration of constraints specified over such nonstructural information into the matching process is desirable.
We therefore introduce a unified notion of constraints in a graph query. We then propose a general matching algorithm that is based on an existing dynamic technique [4], which allows the generation of a search plan on the fly as a query is being executed. On the one hand, the dynamic technique allows tailoring the search to heterogeneities in the host graph that cannot be handled by a static search plan. On the other hand, this approach has the drawback of not being able to consider the overall structure of the query, which can lead to shortsighted decisions during the matching process. To address this problem, we combine our adapted approach with a static but model-sensitive technique for search plan generation [5]. In addition to an analysis of its worst-case runtime complexity, the resulting hybrid approach is evaluated empirically using queries and datasets from the LDBC Social Network Benchmark [6]. Our analytical results confirm that the basic version of the hybrid technique has the same worst-case complexity as the considered static approach. However, our empirical results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.
Compared to [2], we have extended this paper by an analysis of the complexity of our algorithm in conjunction with an analytical evaluation of different strategies for search plan generation, facilitating a comparison to known subgraph isomorphism algorithms. Furthermore, we introduce two new variants of our hybrid approach for search plan generation. We consider these variants in an extended empirical evaluation, which now also includes a completely different static technique for search plan generation introduced in [7] and flat versions of the previously decomposed benchmark queries.
The remainder of the paper is structured as follows: Section 2 briefly introduces the basic notion of graphs, graph morphisms, and graph queries as used in this paper. We then present our generalised algorithm for graph pattern matching in Section 3 and give an overview of an existing static and a dynamic technique for search plan generation. In Section 4, we first integrate these strategies into a hybrid solution. Subsequently, several extensions and modifications of the resulting hybrid approach, such as a more sophisticated consideration of constraint checks during search plan generation, are outlined in Section 5. The developed concepts are evaluated analytically and empirically in Section 6, using a benchmark from the domain of social networks. Section 7 discusses related work and Section 8 concludes the paper.
Section snippets
Prerequisites
We briefly reintroduce the notion of graphs and graph morphisms [8]. A graph consists of a set of vertices , a set of edges , a source function and a target function . Given two graphs and , a graph morphism is a pair of mappings and such that and . In the remainder of this paper, we will refer to and as vertex morphism and edge morphism, respectively. If and are
Search model and search plan generation
In order to represent all possible kinds of constraints specified in a graph query in a uniform manner, we propose a Search Model for graph queries. As displayed in a metamodel in Fig. 2, a Search Model consists of three types of elements, some of which are augmented with states to encode the state of a query execution.
Pattern Nodes represent vertices in the query graph Q and can either be in state BOUND, indicating that a mapping for the Pattern Node has already been determined, or UNBOUND
Hybrid search plan generation
To leverage the more accurate information available during the execution of a graph query while still considering the overall structure of the query, we propose a hybrid strategy for search plan generation. The combined approach can easily be integrated with the Search-Model-based matching algorithm presented in Section 3.1 in the form of a strategy for Pattern Constraint selection and is based on an adapted cost function for Matching Actions. The adapted cost function no longer only considers
Variants of hybrid search plan generation
In the following, we explore different directions for enhancing the hybrid approach to search plan generation introduced in Section 4.
Evaluation
In this section, we evaluate our approach for graph pattern matching based on a Search Model and hybrid search plan generation. In addition to the strategies introduced in the previous sections, we also consider a completely different static approach for search plan generation, GrGen, which was developed by Geiß et al. [7]. It is based on finding a minimum spanning tree of a so called plan graph for a given graph query. While the preliminary computations of GrGen require much less effort
Related work
Subgraph isomorphism algorithms Due to the importance of the subgraph isomorphism problem in multiple domains, several algorithms for finding subgraph isomorphisms have been proposed over the years. These algorithms are usually designed for labelled graphs, which are similar to typed graphs introduces in Section 2 but do not allow parallel edges and do not constrain the types of an edge's source and target based on the edge type.
Ullman's algorithm [17] is an early but popular solution, which
Conclusion
In this paper, we developed an approach for graph pattern matching based on a Search Model representation of a graph query and a generalised version of the algorithm from [4]. We then integrated an existing static technique [5] with our dynamic algorithm and extended the resulting hybrid strategy by a more sophisticated handling of filtering effects. In addition, we considered a variant that takes a more pessimistic approach to computing required static cost estimates and an extension which
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (25)
- et al.
Increasing tree search efficiency for constraint satisfaction problems
Artif. Intell.
(1980) A comparison of current graph database models
- et al.
Hybrid search plan generation for generalized graph pattern matching
- et al.
On the operationalization of graph queries with generalized discrimination networks
- et al.
Improved flexibility and scalability by interpreting story diagrams
Electron. Commun. EASST
(2009) - et al.
An algorithm for generating model-sensitive search plans for EMF models
- et al.
The LDBC social network benchmark: interactive workload
- et al.
GrGen: a fast SPO-based graph rewriting tool
- et al.
Fundamentals of Algebraic Graph Transformation
(2006) - et al.
Efficient subgraph matching by postponing Cartesian products
Henshin: advanced concepts and tools for in-place EMF model transformations
A performance comparison of five algorithms for graph isomorphism
Cited by (4)
Preface to the special issue on the 12th International Conference on Graph Transformation
2020, Journal of Logical and Algebraic Methods in ProgrammingIncremental execution of temporal graph queries over runtime models with history and its applications
2022, Software and Systems ModelingA scalable querying scheme for memory-efficient runtime models with history
2020, Proceedings - 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2020
- ☆
Funding: This work was supported by the Deutsche Forschungsgemeinschaft (grant number GI 765/8-1).