Elsevier

Information Sciences

Volume 572, September 2021, Pages 277-296
Information Sciences

Exploring cohesive subgraphs with vertex engagement and tie strength in bipartite graphs

https://doi.org/10.1016/j.ins.2021.04.027Get rights and content

Abstract

We propose a novel cohesive subgraph model called τ-strengthened (α,β)-core (denoted as (α,β)τ-core), which is the first to consider both tie strength and vertex engagement on bipartite graphs. An edge is a strong tie if contained in at least τ butterflies (2×2-bicliques). (α,β)τ-core requires each vertex on the upper or lower level to have at least α or β strong ties, given strength level τ. To retrieve the vertices of (α,β)τ-core optimally, we construct index Iα,β,τ to store all (α,β)τ-cores. Effective optimization techniques are proposed to improve index construction. To make our idea practical on large graphs, we propose 2D-indexes Iα,β,Iβ,τ, and Iα,τ that selectively store the vertices of (α,β)τ-core for some α,β, and τ. The 2D-indexes are more space-efficient and require less construction time, each of which can support (α,β)τ-core queries. As query efficiency depends on input parameters and the choice of 2D-index, we propose a learning-based hybrid computation paradigm by training a feed-forward neural network to predict the optimal choice of 2D-index that minimizes the query time. Extensive experiments show that (1) (α,β)τ-core is an effective model capturing unique and important cohesive subgraphs; (2) the proposed techniques significantly improve the efficiency of index construction and query processing.

Introduction

Bipartite graphs are widely used to represent networks with two different groups of entities such as user-item networks [1], author-paper networks [2], and member-activity networks [3]. In bipartite graphs, cohesive subgraph mining has numerous applications including fraudsters detection [4], [5], [6], group recommendation [7], [8] and discovering inter-corporate relations [9], [10].

(α,β)-core and k-bitruss are two representative cohesive subgraph models in bipartite graphs extended from the unipartite k-core [11] and k-truss [12] models. (α,β)-core is the maximal subgraph of a bipartite graph G such that the vertices on upper or lower layer have at least α or β neighbors respectively. (α,β)-core models vertex engagement as degrees and treats each edge equally, but ties (edges) in real networks have different strengths. k-bitruss is the maximal subgraph where each edge is contained in at least k butterflies (i.e. 2x2-biclique), which can model the tie strength [13], [14].

In the author-paper network as shown in Fig. 1, the graph is the (α,β)-core (α=2,β=2) and the light blue region is the k-bitruss (k = 2). Without considering tie strength, (α,β)-core blindly includes research groups of different levels of cohesiveness. We can see that v0 and v1 are not as closely connected as the rest authors. The k-bitruss model can exclude the relatively sparse subgraph containing v0 and v1, but it also deletes edges (u3,v4) and (u4,v3) when their incident vertices are present. This exposes the drawbacks of the k-bitruss model: (1) As k-bitruss only keeps strong ties, the weak ties between important vertices are missed. In Fig. 1, it fails to recognize the contributions of authors v3,v4 in papers u3,u4. (2) After removing weak ties, the tie strengths are modeled inaccurately. Edges (u3,v3) and (u4,v4) have more supporting butterflies (u3,u4,v3,v4 form a butterfly) than (u1,v2), but their tie strengths are modeled as equal.

In this paper, we study the efficient and scalable computation of τ-strengthened (α,β)-core, which is the first cohesive subgraph model on bipartite graphs to consider both tie strength and vertex engagement. Given a bipartite graph G, we model the tie strength of each edge as the number of butterflies containing it. With a strength level τ, we consider the edges with tie strength no less than τ to be strong ties. The engagement of a vertex is modeled as the number of strong ties to which it is incident. Given engagement constraints, α,β, and a strength level τ,(α,β)τ-core is the maximal subgraph of G such that each upper or lower vertex in the subgraph has at least α or β strong ties. The (α,β)τ-core model is highly flexible and is able to capture unique structures. For instance, in Fig. 1, the subgraph induced by vertices {u1,u2,u3,u4,u5,v2,v3,v4,v5,v6} is the (2,2)2-core which cannot be found by (α,β)-core or k-bitruss for any α,β or k. Also, as shown in Fig. 1, (α,β)-core can preserve the weak ties if the incident vertices are present (e.g., the red edges are preserved due to u3,u4,v3 and v4), which better resembles reality. The flexibility of the (α,β)τ-core model is also evaluated in another experiment conducted on dataset DBpedia-producer. Fig. 2 shows the subgraphs of different densities found by (α,β)τ-core and (α,β)-core, where density is the ratio between the number of existing edges and the number of all possible edges [13]. 165 subgraphs with a density greater than 0.2 are found by (α,β)τ-core while only 9 such subgraphs are found by (α,β)-core.

Applications. The τ-strengthened (α,β)-core model has many applications. We list some of them below.

  • Identify nested communities. On Internet forums like Reddit, Quora, and StackOverflow, users hold conversations on topics that interest them. The users and the topics form a bipartite network. In these networks, communities naturally exist and are nested. For instance, Reddit displays a list of top communities like “News”, “Gaming”, and “Sports” on the front page. The “Sports” community contains many sub-communities, including “Cricket”, “Bicycling” and “Golf”. The edges in sub-communities have higher tie strength because users and topics within them are more closely connected. By increasing strength level τ,(α,β)τ-core captures the subgraphs forming a hierarchy, which can model nested communities on bipartite networks.

  • Group similar users and items. In online shopping platforms like Amazon, eBay, and Alibaba, users and items form a bipartite graph, where each edge indicates a purchasing record. Such a network consists of many closely connected communities, where the same group of users repeatedly buy some items. Examples of such communities include children-toy communities, student-stationary communities, and patient-medicine communities. Within one community, items are considered more similar, and users tend to be alike due to their everyday shopping habits. As the edges between these users and items have high tie strength (butterfly support), we can use (α,β)τ-core to find these communities and group similar users or items together.

Challenges. To obtain the (α,β)τ-core from the input graph, we can first compute the supports of edges and the engagements of vertices and then iteratively delete the vertices not meeting the engagement constraints. When α,β, and τ are large, (α,β)τ-core is small, and computing (α,β)τ-core from the input graph is time-consuming. Thus, the online computation method cannot support a large number of (α,β)τ-core queries.

In this paper, we resort to index-based approaches. A straightforward solution is to compute all possible (α,β)τ-cores and build a total index Iα,β,τ based on them. Instead of computing all (α,β)τ-cores from the input graph, we take advantage of the nested property of the (α,β)τ-core, which means that if αα,ββ and ττ,(α,β)τ-core is a subgraph of (α,β)τ-core. Specifically, for all possible α and β, we first find (α,β)1-core and then compute (α,β)τ-core while gradually increasing strength level τ. In this manner, we can compute all (α,β)τ-cores and construct the index Iα,β,τ. Although Iα,β,τ supports optimal retrieval of the vertex set of any (α,β)τ-core, it still suffers from long construction time on large graphs. To devise more practical index-based approaches, we face the following challenges.

  • 1.

    When building index Iα,β,τ, it is time-consuming to enumerate all butterflies containing the deleted edges. Also, the Iα,β,τ index construction algorithm is prone to visit the same (α,β)τ-core subgraph repeatedly as it can correspond to different combinations of α,β, and τ. It is a challenge to speed up butterfly enumeration and avoid repeatedly visiting the same subgraphs while constructing the total index Iα,β,τ.

  • 2.

    Due to the flexibility of the (α,β)τ-core model, there are a large number of (α,β)τ-cores corresponding to different combinations of α,β, and τ. The time cost of indexing all (α,β)τ-cores becomes not affordable on large graphs. It is also a challenge to balance building space-efficient indexes and supporting efficient and scalable query processing.

Our approaches. To address the first challenge, we extend the butterfly enumeration techniques in [15] and propose novel computation sharing optimizations to speed up the index construction process of Iα,β,τ. Specifically, we build a Bloom-Edge-Index (hereafter denoted by BE-Index) proposed in [15] to quickly fetch the butterflies containing an edge. The BE-Index captures the relationships between edges and (2×k)-bicliques (also called blooms). When deleting an edge, we can quickly locate the blooms containing this edge in the BE-Index and update the supports of the affected edges in these blooms accordingly. Besides, computation-sharing optimization is based on the fact that the same (α,β)τ-core subgraph corresponds to various parameter combinations. If we realize the vertices in a subgraph have already been recorded, we can skip the current parameter combination.

To address the second challenge, we introduce space-efficient 2D-indexes including Iα,β,Iβ,τ, and Iα,τ, and train a feed-forward neural network to predict the most promising index to handle an (α,β)τ-core query. Instead of indexing all (α,β)τ-cores, the 2D-indexes Iα,β,Iβ,τ, and Iα,τ store the vertex sets of all (α,β)-core, (1,β)τ-core, and (α,1)τ-core respectively. These 2D-indexes are much smaller in size and require significantly less build time, each of which can be used to handle (α,β)τ-core queries. For example, to compute (α,β)τ-core using Iβ,τ, we fetch the vertices in (1,β)τ-core and recover the edges of (1,β)τ-core. Then, we iteratively remove the vertices not having enough engagement from (1,β)τ-core until we find (α,β)τ-core. However, the query processing performance based on each 2D-index is highly sensitive to parameters α,β, and τ. This is because the 2D-indexes only store the vertices in (α,β)-core, (1,β)τ-core, and (α,1)τ-core, and the size difference between (α,β)τ-core and each of these subgraphs is uncertain. We also observe no simple rules to partition the parameter space so that queries from each partition can be efficiently handled by one type of index. This motivates us to resort to machine learning techniques and train a feed-forward neural network as the classifier to predict which index to use for each incoming query of (α,β)τ-core. Since we aim to minimize the query time instead of accuracy, we propose a scoring function, time-sensitive-error, to tune the hyper-parameters of the classifier. The experiment results show that the resulting hybrid computation algorithm significantly outperforms the query processing algorithms based on Iα,β,Iβ,τ, and Iα,τ, and it is less sensitive to varying parameters.

Contribution. Our major contributions are summarized here:

  • We propose the first cohesive subgraph model τ-strengthened (α,β)-core on bipartite graphs, which considers both tie strength and vertex engagement. The flexibility of our model allows it to capture unique and useful structures on bipartite graphs.

  • We construct index Iα,β,τ to support optimal retrieval of the vertex set of any (α,β)τ-core. We also devise computation sharing and BE-Index based optimizations to reduce its construction time effectively.

  • We build 2D-indexes that are more space-efficient and require significantly less build time. We propose a learning-based hybrid computation paradigm to predict which index to choose to minimize the response time for an incoming (α,β)τ-core query.

  • We validate the efficiency of proposed algorithms and the effectiveness of our model through extensive experiments on real-world datasets. Results show that the 2D-indexes are scalable, and the hybrid computation algorithm on a well-trained neural network can outperform the algorithms based on each 2D-index alone.

Organization. The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 summarizes important notations and definitions and introduces (α,β)-core and τ-strengthened (α,β)-core. Section 4 presents the online computation algorithm. Section 5 The decomposition based total index, 6 Optimizations of index construction presents the total index Iα,β,τ and optimizations of the index construction process. Section 7 presents the learning-based hybrid computation paradigm. Section 8 shows the experimental results, and Section 9 concludes the paper.

Section snippets

Related work

In the literature, there are many recent studies on cohesive subgraph models on both unipartite graphs and bipartite graphs.

Unipartite graphs. k-core [11], [16], [17], [18] and k-truss [12], [19], [20] are two of the most well-known cohesive subgraph models on general, unipartite graphs. On a unipartite graph, k-core is the maximal subgraph such that each vertex in the subgraph has at least k neighbors. k-core models vertex engagement as degrees and assumes the importance of each tie to be

Problem definition

In this section, we formally define our cohesive subgraph model τ-strengthened (α,β)-core. We consider an unweighted, undirected bipartite graph G(V,E). V(G) = U(G)L(G) denotes the set of vertices in G where U(G) and L(G) represent the upper and lower layer, respectively. E(G)U(G)×L(G) denotes the set of edges in G. We use n = |V(G)| to denote the number of vertices and m = |E(G)| to denote the number of edges. The maximum degree in the upper and lower layer is denoted as dmax(U) and dmax(L),

The online computation algorithm

Given engagement constraints, α,β, and a strength level τ, the online algorithm to compute the (α,β)τ-core is outlined in Algorithm 1. First, we compute the support of each edge e using the algorithm in [34] and count how many strong ties each vertex u has. Once the strong ties are identified, the upper vertices with fewer than α strong ties and the lower vertices with fewer than β strong ties are the weakly-engaged vertices. Then, Algorithm 2 is invoked to iteratively remove these weakly

The decomposition based total index

Given α,β, and τ, Algorithm 1 computes the (α,β)τ-core from the input graph, which is slow and cannot handle a large number of queries. In this section, we present a decomposition algorithm that retrieves all (α,β)τ-cores, and we build a total index based on the decomposition output to support efficient query processing.

Algorithm 3:Decomposition

The decomposition algorithm. The following lemma is immediate based on Definition 5, which depicts the nested relationships among (α,β)τ-cores.

Lemma 2

(α,β)τ

Optimizations of index construction

The above decomposition algorithm has these issues: (1) The same subgraph can be computed repeatedly for different α and β values. For example, if (1,1)τ-core is the same subgraph as (1,2)τ-core, then we will compute it twice when β=1 and β=2. (2) When removing an edge e, we need to enumerate all the butterflies containing e. The basic implementation of butterfly enumeration is inefficient, which finds three connected vertices first and then check if a fourth vertex can form a butterfly with

A learning-based hybrid computation paradigm

Although the index Iα,β,τ supports the optimal retrieval of the vertices in the queried (α,β)τ-core, it does not scale well to large graphs due to its long build time and large space complexity even with the related optimizations. For instance, on datasets Team, Wiki-en, Amazon, and DBLP, the index Iα,β,τ cannot be built within two hours as evaluated in our experiments. In this section, we present 2D-indexes that selectively store the vertices of (α,β)τ-core for some combinations of α,β, and τ.

Experiments

In this section, we first validate the effectiveness of the τ-strengthened (α,β)-core model. Then, we evaluate the performance of the index construction algorithms as well as the query processing algorithms.

Conclusion

In this paper, we introduce a novel cohesive subgraph model, τ-strengthened (α,β)-core, which is the first to consider both tie strength and vertex engagement on bipartite graphs. We propose a decomposition-based index Iα,β,τ that can retrieve the vertices of any (α,β)τ-core in optimal time. We also apply computation sharing and BE-Index-based optimizations to speed up the index construction process of Iα,β,τ. To balance space-efficient index construction and time-efficient query processing, we

CRediT authorship contribution statement

Yizhang He: Writing - original draft, Methodology, Software. Kai Wang: Conceptualization, Methodology, Investigation. Wenjie Zhang: Conceptualization, Methodology, Writing - original draft. Xuemin Lin: Supervision. Ying Zhang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Xuemin Lin is supported by the National Key R\&D Program of China under grant 2018AAA0102502 and ARC DP200101338. Wenjie Zhang is supported by ARC DP210101393 and ARC DP200101116. Ying Zhang is supported by FT170100128 and ARC DP180103096.

References (39)

  • D. Ding et al.

    Efficient fault-tolerant group recommendation using alpha-beta-core

  • E. Ntoutsi et al.

    Fast group recommendations by applying user clustering

  • D. Palmer

    Interlocking directorates and intercorporate coordination

    Social Networks: Critical Concepts Sociol.

    (2002)
  • J. Cohen

    Trusses: Cohesive subgraphs for social network analysis

    National Security Agency Tech. Rep.

    (2008)
  • A.E. Sarıyüce et al.

    Peeling bipartite networks for dense subgraph discovery

  • Z. Zou

    Bitruss decomposition of bipartite graphs

  • K. Wang et al.

    Efficient bitruss decomposition for large-scale bipartite graphs

  • J. Cheng et al.

    Efficient core decomposition in massive networks

  • W. Khaouid et al.

    K-core decomposition of large networks on a single pc

    Proc. VLDB Endowment

    (2015)
  • Cited by (22)

    • A parameter-free approach to lossless summarization of fully dynamic graphs

      2022, Information Sciences
      Citation Excerpt :

      Consequently, graph summarization can concisely represent the original graph. Then, we can conduct graph computing based on the summary graph, including query processing [3–5], extraction and interaction analysis [6–8], and processing in hardware [9–11]. Furthermore, lossless graph summarization is an accurate compression technique, which is more appropriate for many applications.

    • Cohesive Subgraph Discovery Over Uncertain Bipartite Graphs

      2023, IEEE Transactions on Knowledge and Data Engineering
    • Searching Personalized k-Wing in Bipartite Graphs

      2023, IEEE Transactions on Knowledge and Data Engineering
    View all citing articles on Scopus
    1

    This manuscript is the authors’ original work and has not been published nor has it been submitted simultaneously elsewhere.

    2

    All authors have checked the manuscript and have agreed to the submission.

    View full text