Compact structure for sparse undirected graphs based on a clique graph partition
Introduction
A wide variety of real systems are modeled by graphs, including communication, transit, web, social, and biological networks [1], [2]. The process of discovering relevant information from graphs is called graph mining [3]. This is usually a time-consuming task, especially with the current trend of data growth size [4]. The main challenges are triggered by different aspects. This includes the data volume itself, data complexity (i.e., many relationships among the data), and application needs [4]. Several schemes have been proposed for analyzing graphs that aim at understanding the properties and patterns found in them to serve different application purposes. Some known applications include disease analysis [5], community discovery [1], [2], [6], recommender systems [7], graph compression [8], [9], [10], measuring relevance of network actors [11], [12], and network visualization [13], [14]. Recent works on graph mining postulate that dense patterns are prominent and describe different dense substructures. Some examples include maximal cliques [15], [16], communities [17], and others [3], [9], [18], [19]. These substructures have been used for improving network analysis, graph compression [9], [20], and visualization [13].
Given the space required to store and analyze large graphs, the research community has proposed graph compression formats that support basic navigation queries directly over the compressed structure without requiring decompression. This approach enables the simulation of any graph algorithm in the main memory, requiring less space than plain representations. Even though these compressed structures are usually slower than uncompressed representations, they are still attractive in devices with limited memory. This includes devices, such as tablets or cell phones. Moreover, these in-memory representations can provide faster access than plain representations incurring I/O costs [21], [22].
Although there are different types of real-world graphs of interest, this work aims at processing highly clustered and sparse graphs. Clustered graphs contain vertices grouped in highly connected subgraphs. These graphs have high clustering coefficient and transitivity [23] measures. In practice, many real-world graphs are sparse, for example graphs with low degeneracy [16].
This work proposes a compact data structure for clustered sparse undirected graphs that exploits the cliques to represent the edges implicitly. Further, it makes use of the vertex redundancy of the cliques by partitioning them into components that share many vertices. This structure enables neighbor queries, as well as queries for recovering all or subsets of the maximal cliques. Finding maximal cliques is an important step in the clique percolation method (CPM). This has been successfully used for community searches in biological networks [1], social group evolution [24], human disease pattern discovery [5], and computing and visualizing topological features using persistence homology in network analysis [14].
The structure is built on a partition of the clique graph, where each node is a maximal clique in the original graph. The proposed method uses a fast algorithm for listing all maximal cliques and defines an effective heuristic for finding a clique graph partition avoiding the construction of the clique graph. From this, a compact representation of the partitioned clique graph is proposed.
The experimental evaluation shows that the compressed graph representation is competitive with the state-of-the-art methods in terms of compression efficiency for large real graphs, obtaining the smallest representation for clustered graphs. This high compression is achieved, in some cases, at the expense of slower access times when answering neighbor queries. As discussed, in a context of limited memory or steep memory hierarchies, using less space can be of special interest. This may allow the representation to fit into faster memory levels and, in the case of larger datasets, prevents it from being handled on slower ones, such as disks [21], [22]. In addition, according to our knowledge, beside neighbor queries, the structure presented in this study is the first proposal that enables maximal clique queries. This is an important operation for applications that use clique communities. Furthermore, retrieving maximal cliques from the compressed representation is much faster than listing them from the original graph.
The implementation of the proposed method is available at http://www.inf.udec.cl/c̃hernand/sources/cliquecomp/cliquecomp.tgz.
Section snippets
Related work
Boldi and Vigna [25] proposed in 2004 one of the best-known techniques for web graph compression, which offered the best space/time trade-off for many years. They presented the WebGraph framework, which obtains very compact representations of web graphs by exploiting their regularities and statistical properties. More concretely, they exploit the locality of reference, since web pages generally include links to other web pages of the same domain. They also exploited the similarity of the
Proposed method
This section describes a new method for compressing real sparse undirected graphs using a compact data structure that takes advantage of the vertex redundancy of the graph represented by its maximal cliques. In this method, vertex redundancy refers to vertices that belong to multiple maximal cliques. Such vertices can be stored only once to reduce space.
The proposed compression method includes three steps. The first step (clique listing) lists all the maximal cliques of size at least two in the
Query algorithms
This section describes how the main queries are solved using the compact data structure. Algorithm 2 displays a sequential algorithm that retrieves the input graph G in a single pass. The time complexity of the sequential algorithm is .
The algorithm goes through each partition p of the compact representation, retrieves all of the edges in that partition and adds all those edges to build E. If a partition contains only one clique, then all the possible edges are
Experimental evaluation
This section describes several experiments to tune and compare our method with the state-of-the-art algorithms for compressing graphs, including version 3.6.1 of WebGraph (WG) [33], the graph compression by BFS from Apostolico and Drovandi (AD) [32], and the -tree [20]. The results of the compression efficiency reported by Rossi and Zhou for GraphZIP [36] are also included, although they do not support query operations. All of the experiments ran on a machine with an Intel i7-7500U CPU @
Conclusions
This work introduces a new compact representation of real sparse and clustered undirected graphs based on clique graph partitioning. The method first lists all the maximal cliques of the input graph. Then, it defines a clique graph, whose vertices are the cliques in the original graph. Next, it finds a partition of the clique graph, which is finally encoded in a compressed form using compact data structures.
Our method includes an effective heuristic to find a partition in the clique graph, by
CRediT authorship contribution statement
Felipe Glaria: Conceptualization, Writing - original draft, Software, Visualization. Cecilia Hernández: Conceptualization, Formal analysis, Writing - original draft, Software, Visualization, Writing - review & editing. Susana Ladra: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Gonzalo Navarro: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Lilian Salinas: Formal analysis.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie [grant agreement No 690941]; from the Ministerio de Economía y Competitividad (PGE and ERDF) [Grant Nos. TIN2016-77158-C4-3-R]; from Xunta de Galicia (co-founded with ERDF) [Grant Nos. ED431C 2017/58; ED431G 2019/01]; from the Center for Biotechnology and Bioengineering (CeBiB), Chile; and from the Millennium Institute for Foundational Research on Data
References (46)
- et al.
Disease global behavior: A systematic study of the human interactome network reveals conserved topological features among categories of diseases
Inform. Med. Unlocked
(2019) Applying the clique percolation method to analyzing cross-market branch banking network structure: the case of illinois
Social Network Anal. Mining
(2016)- et al.
A divide and agglomerate algorithm for community detection in social networks
Inf. Sci.
(2019) - et al.
Compact representation of web graphs with extended functionality
Inform. Syst.
(2014) - et al.
Graph structure in the Web
Comput. Netw.
(2000) A partial characterization of clique graphs
J. Comb. Theory
(1968)- et al.
A characterization of clique graphs
J. Comb. Theory, Series B
(1971) - et al.
The wavelet matrix: An efficient wavelet tree for large alphabets
Inform. Syst.
(2015) - et al.
Uncovering the overlapping community structure of complex networks in nature and society
Nature
(2005) - Jianxin Li, Xinjue Wang, Ke Deng, Xiaochun Yang, Timos Sellis, and Jeffrey Xu Yu. Most influential community search...
A survey of frequent subgraph mining algorithms
Knowl. Eng. Rev.
Managing and mining large graphs: systems and implementations
A scalable pattern mining approach to Web graph compression with communities
Compressed representations for web and social graphs
Knowl. Inf. Syst.
Compressing networks with super nodes
Sci. Rep.
Meta structure: Computing relevance in large heterogeneous information networks
The network data repository with interactive graph analytics and visualization
Clique community persistence: A topological visual analysis approach for complex networks
IEEE Trans. Visualization Computer Graphics
New algorithms for enumerating all maximal cliques
The k-clique densest subgraph problem
Cited by (13)
P2S distance induced locally conjugated orthogonal subspace learning for feature extraction
2024, Expert Systems with ApplicationsThe minimum quasi-clique partitioning problem: Complexity, formulations, and a computational study
2022, Information SciencesCitation Excerpt :Among the critical problems of obtaining large dense subgraphs, we can highlight the maximum clique problem and the maximum quasi-clique problem. Besides, partitioning graphs into dense subgraphs finds applications in several areas, such as bioinformatics [18], quantum computing [45], data mining [16], and community detection [47,13,48]. The minimum quasi-clique partitioning problem lies in this family of problems.
Efficient game theoretic approach to dynamic graph partitioning
2022, Information SciencesGraph compression based on transitivity for neighborhood query
2021, Information SciencesCitation Excerpt :They decomposed a graph into a set of large cliques, and then compressed and represented the graph succinctly. A more successful approach was proposed by [16] based on maximal cliques. This approach lists all the maximal cliques and defines a clique graph based on them.
Iterated multilevel simulated annealing for large-scale graph conductance minimization
2021, Information SciencesCitation Excerpt :It would be useful to investigate additional strategies to be able to handle both types of graphs. In particular, other graph representations using compact structure [14] may be considered to reduce the space complexity of the algorithm. Fourth, in addition to the studied memetic and local search methods in the literature, it is worthy investigating other metaheuristic-based algorithms to better handle various types of graphs and further enrich the MC-GPP toolkit.
MIP formulations for induced graph optimization problems: a tutorial
2023, International Transactions in Operational Research