Introduction

Community detection is a fundamental problem in network analysis, as community structure which almost exists in all networks, is the most widely studied structural properties of networks.

Statistical network generative model, due to its solid theoretical base, remarkable interpretability and relative tractability, has been wildly used for community detecting tasks [1]. Existing network generative models can be grouped into two classes: the latent class model, and the latent feature model. The latent class model assume that each individual only affiliate with a single class (as show in Fig. 1a). The latent feature model, increases the flexibility of the generative process by permitting each object possesses a vector of features and determine the link probabilities based on interactions among the features. In many real-world networks, communities are ordinarily overlapping rather than disjoint, so assuming that each object having hard membership in only one cluster became too restrict to consistent with the facts.

Fig. 1
figure 1

Binary matrix indicates node’s community affiliation

An important challenge in community detection is to specify the number of communities in advance, as we do not have good prior knowledge of how many parameters the model requires to explain the data well. The relational infinite latent feature model (rILFM), in which the number of latent variables is unbounded, is a flexible Bayesian nonparametric approach that is a proper choice for such situation, as its number of parameters can be vary along with the data increasing.

The Indian buffet process (IBP) [2] is often used to develop construction for the overlapping community assignment matrix, in which each object is represented by a sparse subset of an unbounded number of features, thus can lead to a Bayesian nonparametric version of the latent feature model.

As show in Fig. 1b, the set of features possessed by a set of objects can be expressed in the form of a binary matrix Z with infinite columns and exchangeable rows, where the ith row is an object, and the kth column corresponds to a feature, \(z_{ik}\) indicates that object i possesses feature k. The infinite binary matrix Z can describe that each individual is characterized by a set of features, or equivalently to say that each individual belongs to multiple communities simultaneously, which is intuitively named as overlapping community structure.

Most of the existing works represent a network as a symmetric binary adjacent matrix and a Bernoulli distribution (or a logistic Gaussian distribution) is chosen to formulate the generative mechanism, for its simplicity. The symmetric binary adjacent matrix representation has two limitations: (1) when we transform these count-value networks into a symmetric binary adjacent matrix representation, we lose many valuable network information which can help to find overlapping community, e.g., if we use binary network, all nodes play equal roles in one community, as there only have two situations: linked or not linked; but, if we consider the interaction times between nodes, they are no longer play equal roles, the count vale may imply which nodes are at the core of one community, which are at the periphery. (2) The MCMC (Markov chain monte carlo) inference of the generative model with Bernoulli likelihood is difficult to derive.

It is well known to us that count-value networks naturally arise and are pervasive in our modern life. For example, in communicate networks, such as email networks, phone call networks, instant messaging networks, worker recruitment influence networks in mobile crowd sensing (MCS) platforms [3] etc., interactions are often directed and have an associated count value, i.e., person i can send mails (make phone calls or send messages) to person j many times. On online social media service platforms such as Twitter, Facebook, BBS, and MCS [4], people follow (comment, like or reply to) those whom they are interested in, such interactions also have direction and are associated with interaction times.

In this article,we concerned on overlapping community detection for count-value social networks. We propose a generative model for count-value networks with overlapping community structure: the network is modeled as a Poisson point process, after applying Poisson factor analysis on the corresponding count matrix, we obtain \(M=Z\Lambda Z^T\), which is akin to the mixed membership stochastic block model (MMSB) [5] that can express the overlapping community structure. The IBP is used as the prior to model the community assignment matrix Z; thus, allows the number of communities K to be determined at inference time instead of to be predefined. Both a collapsed and an uncollapsed Gibbs sampler for the generative model have been derived. We reinforce the validity of the theoretical results via extensive experiments on simulated network data and real network data.

Related works

Following the seminal work of Erdos and Renyi [6], various random graph models have been proposed. The celebrated SBM (stochastic block model) [7] and its extensions such as the IRM (infinite relational model, Kemp et al. [8]), MMSB (Airoldi et al. [5]), DCSBM (degree-corrected SBM, Karrer et al. [9]), DSBM (dynamic SBM, Pensky [10]), have a wide variety of applications in network community detection, and form a huge corpus especially in social sciences and machine learning. We do not present an exhaustive review here; for an up-to-date account of various aspects, we direct the reader to Fortunato [11], Xie et al. [12] and Matias et al. [13] for reference.

There already have some pioneering works which composing the ideas of the classical MMSB model and the nonparametric Bayesian approach to increase the flexibility of network generative process by letting each node possess potentially infinite number of features, for example, the celebrated LFRM (latent feature relational model) proposed by Miller et al. [14], which was previously described in Meeds et al. [15]. The IMRM (infinite multiple relational) model proposed by Morup et al. [16] is a variant of the LFRM model, in which a noisy-or likelihood was used instead of the logistic Gaussian likelihood. The ILA [17] (infinite latent attribute) model presented in Palla et al. (2015) generalized the LFRM mode by allowing an explicit representation of the partitioning of each general community into subclasses, thus providing a more structured representation of the data. All these models assume that K is not known a priori and use the IBP to account for the number of latent communities.

Although most of the existing work does not consider count-value networks, some research work provides an exception. For example, Karrer and Newman introduced the DCSBM model [9], they assumed that the links between nodes i and j follow a Poisson distribution and, thus, represented network as a count adjacent matrix. This method is reasonable, as the Poisson distribution is the natural probability distribution for modeling counts. Tue Herlau et al. [18] formulated a nonparametric Bayesian generative model for the DCSBM (they named it IDCSBM), where the number of communities is inferred via the Chinese restaurant process [19]. These two models can be used to detect only nonoverlapping communities.

The celebrated IBP model, originally studied by Ghahramni and Griffiths [2], Thibaux and Jordan [20], connected the IBP to the theory of completely random measure by showing that it could be constructed from an exchangeable sequence of beta-Bernoulli processes. They further showed that the beta-Bernoulli process is the underlying de Finetti mixing measure for the IBP.

The Poisson factor model, which is also named the Gamma-Poisson model, is a probabilistic matrix factorization model that has been widely used in many areas such as image reconstruction, text information retrieval, and collaborative filtering etc.. The first application of Poisson factor analysis to network analysis was presented in Zhou et al. [21].

The proposed model

Let \(G= (V,E)\) denote a count-value graph, \(G_t= (V_t,E_t)\) denote a network snapshot which was observed at time t. \(V_t=\{v_1,v_2,\ldots ,v_N\}\) is node set of \(G_t\), nodes often correspond to persons or objects in network. \(N=|V_t|\) is the number of nodes. \(E_t\) is the edge set, edges often correspond to relationships between objects. Each observed edge inherently associate with a count value \(m_{ij}\). The dynamically evolving network G can be modeled using a random process, and this infinite random process can be decomposed into many observed network snapshots. Each network snapshot \(G_t\) is finite, so it correspond to an adjacent matrix M which is a count-value matrix. The application of Poisson factor analysis to the random count matrix M, results in \(M=Z\Lambda Z^T\), where the \(N\times K\) matrix Z is called the community assignment matrix of the network, and the \(K\times K\) square matrix \(\Lambda \) is called the community compatibility matrix. In this case, we have

$$\begin{aligned} m_{ij}\sim Poisson \Bigg(\sum \limits _{k_1=1}^K \sum \limits _{k_2=1}^Kz_{ik_1}\lambda _{k_1k_2}z_{jk_2}\Bigg), \end{aligned}$$

where \(z_{ik_1}\) expresses how strongly node i is affiliated with community \(k_1\), and \(\lambda _{k_1k_2}\) measures how strongly communities \(k_1\) and \(k_2\) interact with each other. The product \(z_{ik_1}\lambda _{k_1k_2}z_{jk_2}\) measures how strongly nodes i and j are connected due to their affiliations with communities \(k_1\) and \(k_2\) respectively. One caveat here is that the infinite Gamma-Poisson model often use the multi-scoop IBP, which is a distribution over a random count matrix, as the prior of Z; but here we use the basic IBP which is a distribution over a random binary matrix.

The generative process of our model is as follow:

$$\begin{aligned} \begin{aligned} P(M)=&\prod \limits _{j=1}^N \prod \limits _{i=1}^N P(m_{ij}),\quad m_{ij}\sim Poisson(\rho _{ij}) \\ m_{ij} =&\sum _{k_1=1}^K \sum _{k_2=1}^K m_{ik_1k_2j} \quad m_{ik_1k_2j} \sim Poisson (\lambda _{k_1k_2} )\\ \rho _{ij} =&\sum _{k_1=1}^K \sum _{k_2=1}^K \lambda _{k_1k_2} =Z_iZ_j*\lambda \\ Z=&\,(Z_1,\ldots ,Z_N)^T \quad \ Z\sim IBP(\alpha ,N)\quad \alpha \sim Gamma(e,f)\\ P(\Lambda )=&\prod \limits _{k_1=1}^K \prod \limits _{k_2=1}^K P(\lambda _{k_1k_2}),\quad \lambda _{k_1k_2}=\lambda \quad \lambda \sim Gamma(a,b) \end{aligned} \end{aligned}$$
(1)

Here, we let all \(\lambda _{k_1k_2}=\lambda \), we will explain the reason in "Inference tricks" subsection . The probability graph model representation for the generative process is depicted in Fig. 2.

Fig. 2
figure 2

Probabilistic graph model representation of the rILFM model

Apparently, the Poisson factor analysis, is guaranteed by the superposition principle of the Poisson point processes.

Superposition is an additive set operation such the superposition of a k-point configuration in \(X_n\) is a \(kn-point\) configuration in X. Examples of Poisson superposition processes include the compound Poisson, and the negative binomial processes.

Theorem 1

(Poisson Superposition Principle) Give k independent Poisson point processes \(\Pi _1,\Pi _2,\ldots ,\Pi _k\), and the corresponding counting processes are \(N_1,N_2,\ldots ,N_k\), which with intensity measure \(\mu _1,\mu _2,\ldots ,\mu _k\), then \(\Pi =\cup _{i=1}^k\Pi _i\) also is a Poisson point process, the corresponding counting process is \(N=\sum _{i=1}^kN_i\), its intensity is \(\mu =\sum _{i=1}^k\mu _i\) [22].

We apply the restriction that links are directly generated by individual features instead of through complex interactions between features, so that feature and community are the same concepts, i.e., stating that node i possesses feature j is equivalent to stating that node i is affiliated with community j.

As show in Fig. 1b, nodes are assigned to a set of communities can be expressed in the form of a binary matrix with infinite columns and exchangeable rows, where the ith row is the community assignment vector \(Z_i\) of the node i, and the jth column corresponds to a community, \(z_{ij}=1\) indicates that node i affiliated to community j. As \(Z_i\) may has many nonzero element, i.e. there is no assumption of mutual exclusivity and exhaust, thus the community affiliation matrix Z can characterize overlapping community structure in a network.

Parameter inference

The IBP is a distribution over an exchangeable binary matrix, it can be constructed in two ways, restaurant construction and stick-breaking construction. The former easily lends itself to MCMC inference, and the latter easily lends itself to variational inference [23]. Although the execution time required for MCMC inference is cubic due to the number of observations and thus often scales poorly [24], we can only use MCMC to infer the rILFM models if we do not want to predefine K because the stick-breaking construction of the IBP leads to a variational method for inference based on truncating to a finite model. Thus we must predefine the truncating level, which is as difficult as predefining K.

In this paper, we derived both a collapsed and an uncollapsed Gibbs sampler for Z. In "Uncollapsed Gibbs sampler" subsection, we illustrate the uncollapsed Gibbs sampler based MCMC inference algorithm, and in "Collapsed Gibbs sampler" subsection, we depict details about derivation of the collapsed sampler.

Uncollapsed Gibbs sampler

Let \(M_1\) denote the set of observed links, \( (i,j)\in M_1\) means that there is a link between node i and j (in other word, \(m_{ij}>0\)), \(EV=\sum \nolimits _{(i,j)\in M_1}m_{ij}\) denote the total number of links, \(C=\sum \nolimits _{i=1}^n\sum \nolimits _{j=1}^n(Z_iZ_j)=Z\bigodot Z^T\) denote the total number of communities shared by node pairs \( (i,j)\in M\) (\(\bigodot \) denote the Hadamard product operation on matrix), HN denote the harmonic number, \(Z_{-ik}\) denote all community assignments except \(z_{ik}\), \(k_{new}\) denote new sampled features for each object. The inference procedure of our model is as follow:

figure a

In each sampling iteration, for each object, when we determine number of new features, the likelihood \(P(M|Z_{new},\lambda )\) is obtained by the integral

$$\begin{aligned} \int _{\Lambda _{new}}P(M|Z_{new},\lambda )P(\Lambda _{new})d\Lambda _{new}. \end{aligned}$$

We need to perform a Monte Carlo integration to draw \(k_{new}\) according to \(P(k_{new})\propto Poisson(k_{new};\frac{\alpha }{N})P(M|Z_{new},\lambda )\). This procedure is equivalent to an importance sampling procedure: first, we draw many pairs \( (k_{new},\Lambda _{new})\), where \(\Lambda _{new}\) denote new part of \(\Lambda \) which correspond to those new features. Then, assign a weight to each pair based on the data likelihood \(P(M|Z_{new}, \lambda ,\Lambda _{new})\). Last, based on the weights,we sample a pair \( (k_{new},\Lambda _{new})\) and take its \(k_{new}\) item as our \(k_{new}\).

Collapsed Gibbs sampler

Different from the uncollapsed Gibbs sampler, the collapsed Gibbs sampler use P(M|Zab) as likelihood distribution instead of \(P(M|Z,\lambda )\), and thus we need not to update \(\lambda \), i.e., step 2.2 in the Algorithm 1 can be omitted. As differences between the two samplers are very clear, we have no need to illustrate the collapsed Gibbs sampler based MCMC inference algorithm, we just depict details about derivation of the collapsed sampler here.

First, we derive the likelihood distribution which was used in the uncollapsed Gibbs sampler. Let \(M_0\) denote the set of observed unlinks, \( (i,j)\in M_0\) means that there is no link between node i and j (in other word, \(m_{ij}=0\)).

  1. 1.

    Derive the likelihood in the uncollapsed Gibbs sampler

    $$\begin{aligned}&P(M|Z,\lambda )=\prod \limits _{(i,j)\in M_1}\frac{\rho _{ij}^{m_{ij}}}{m_{ij}!}exp(-\rho _{ij})\prod \limits _{(i,j)\in M_0}exp(-\rho _{ij})\\&\quad =\prod \limits _{(i,j)\in M_1}\frac{\rho _{ij}^{m_{ij}}}{m_{ij}!}\prod \limits _{(i,j)\in M_1}exp(-\rho _{ij})\prod \limits _{(i,j)\in M_0}exp(-\rho _{ij})\\&\quad =\prod \limits _{(i,j)\in M_1}\frac{\pi _{ij}^{m_{ij}}}{m_{ij}!}\prod \limits _{(i,j)\in M}exp(-\rho _{ij})\\&\quad =\prod \limits _{(i,j)\in M_1}\frac{(\lambda *\sum Z_iZ_j)^{m_{ij}}}{m_{ij}!}\prod \limits _{(i,j)\in M}exp \left(-\sum (Z_iZ_j)*\lambda \right)\\&\quad =\prod \limits _{(i,j)\in M_1}\frac{(\sum Z_iZ_j)^{m_{ij}}*\lambda ^{m_{ij}}}{m_{ij}!}\prod \limits _{(i,j)\in M}exp \left(-\sum (Z_iZ_j)*\lambda \right)\\&\quad =\prod \frac{\left(\sum Z_iZ_j\right)^{m_{ij}}}{m_{ij}!}*\prod \lambda ^{m_{ij}}\prod \limits _{(i,j)\in M}exp\left(-\sum (Z_iZ_j)*\lambda \right)\\&\quad =\frac{\prod (\sum Z_iZ_j)^{x_{ij}}}{\prod x_{ij}!}*\lambda ^{\sum m_{ij}}exp\left(-\sum \limits _{(i,j)\in M}\sum (Z_iZ_j)*\lambda \right)\\&\quad =\frac{\prod (\sum Z_iZ_j)^{m_{ij}}}{\prod m_{ij}!}*\lambda ^{EV}exp(-C*\lambda) \end{aligned}$$

    As the likelihood distribution in the uncollapsed sampler is conjugate to the prior of \(\lambda \), we can integrate out \(\lambda \) to obtain the likelihood in the collapsed sampler.

  2. 2.

    Integrate out \(\lambda \) to obtain the likelihood in the collapsed sampler

    $$\begin{aligned}&P(M|Z,a,b)=\int _\lambda P(M|Z,\lambda )P(\lambda |a,b)d\lambda \\ \\&\quad =\int _\lambda \frac{\prod (\sum Z_iZ_j)^{m_{ij}}}{\prod m_{ij}!}*\lambda ^{EV}exp(-C*\lambda )\frac{b^a}{\Gamma (a)}\lambda ^{a-1}exp(-b\lambda )d\lambda \\&\quad =\frac{\prod (\sum Z_iZ_j)^{m_{ij}}}{\prod m_{ij}!}\frac{b^a}{\Gamma (a)}\int _\lambda \lambda ^{a+EV-1}exp(-(b+C)\lambda )d\lambda \\&\quad =\frac{\prod (\sum Z_iZ_j)^{m_{ij}}}{\prod m_{ij}!}\frac{b^a}{\Gamma (a)}\frac{\Gamma (a+EV)}{(b+C)^{a+EV}}\int _\lambda \frac{(b+C)^{a+EV}}{\Gamma (a+EV)}\lambda ^{a+EV-1}exp(-(b+C)\lambda )d\lambda \\&\quad =\frac{\prod (\sum Z_iZ_j)^{m_{ij}}}{\prod m_{ij}!}\frac{b^a}{\Gamma (a)}\frac{\Gamma (a+EV)}{(b+C)^{a+EV}}\\&\quad =\frac{\prod (\sum Z_iZ_j)^{m_{ij}}}{\prod m_{ij}!}\frac{b^a\prod _{k=1}^ {EV} (k+a)}{(b+C)^{a+EV}} \end{aligned}$$

Inference tricks

In order to derive a feasible MCMC inference procedure, we make the following assumptions for our model:

  1. 1.

    We assume that \(\Lambda \) is a diagonal matrix, links only exist between nodes in the same community, i.e., there’s no link from a node in community \(k_1\) to a node in community \(k_2\) when \(k_1!=k_2\);

  2. 2.

    We restrict all link probability \(\lambda _{k_1k_2}\) to take the same value \(\lambda \), this means nodes within each community have same opportunity to form a link.

These two assumptions can bring us two benefits, one is that we don’t need to change the shape of \(\lambda \) along with the changes of K, the other is that we can obtain the conjugacy between the likelihood and the Gamma prior for \(\lambda \). Under this circumstance, \(\lambda \) can be integrated away and a collapsed Gibbs sampler for Z can be derived.

The IBP has a major weakness: the generated Z is determined only by N and \(\alpha \), regardless of the characteristics of the observations. For example, if node i is an isolated node, its community assignment vector should be an all-zero vector, but the IBP ignores this fact and assigns node i to some communities. Some steps are taken to correct this clear mistake and to avoid unnecessarily updating of the all-zero rows in Z. And accordingly make the MCMC inference accelerated.

  1. 1.

    Assign a flag to isolated node

    We maintain a flag vector with all-zero initial values. First, we check each node in the graph. If its in-degree and out-degree both are zero, we set its flag to one to indicate that the node is not affiliated with any community;

  2. 2.

    Skip unnecessary update steps

    After the initial Z has been generated, according to the flag, we change the corresponding row in Z to an all-zero vector. In the process of each MCMC iteration, when we update Z, if a node’s flag is one, we don’t update the corresponding row.

    After we perform posterior inference on Z, based on the assumption that a community should contain at least three nodes, we will cancel those columns in the inferred Z which have less than three non-zero values.

Per-iteration running times

For both the uncollapsed Gibbs sampler and the collapsed Gibbs sampler, when analysis algorithm complexity, we only consider the number of the Hadamard product operates on Z (i.e., element-wise matrix multiplication \(Z\bigodot Z^T\)) for one sweep through a \(N*K\) community assignment matrix Z under a compound Poisson likelihood model.

The running time of both two Gibbs samplers are dominated by the computation of the likelihood. When we change one element of Z, the likelihood need to be calculated twice, thus Z may be updated in \(O(N^3K)\) time.

Experiments

We implemented our model and the inference algorithm using python. After we finished Bayesian analysis, the posterior which contains all the information about model parameters according to the observed data and the model, was need to be summarized [25].

For single variable parameters such as \(\alpha \) and \(\lambda \), it is easy to communicate the result, as the most probable posterior value is given by the mode of the posterior distribution (i.e., the peak of the distribution). It is also a good choice to report the mean (or median) of the distribution and some other measure, such as standard deviation or HPD (highest posterior density) interval, to have an idea of the dispersion and hence the uncertainty in our estimate [25].

Experiment on synthetic data

We analyzed one synthetic network generated according to our network generative model. Because the ground truth is known, it is easy to empirically validate our theoretical findings. We generate synthetic data from the IBP prior (with \(N = 30,a = b = 1,e = 14,f = 1/HN\), \(\alpha \sim Gamma(e,f)\), \(\alpha =1.7658\)) and the compound Poisson model (with \(\lambda \sim Gamma(a,b)\), \(\lambda =0.3872\)). The simulated graph is a directed graph, with 30 nodes and 666 edges, its adjacent matrix M and community assignment matrix Z were depicted in Fig. 3a, b.

Fig. 3
figure 3

a Depict the adjacent matrix M of the simulated graph, b depict its community assignment matrix Z, these are the ground truth. c depict the inferred Z via the uncollapsed sampler, which were obtained from chain2 in the 6997th MCMC iteration. d depict Z sampled from chain2 in the 1000th MCMC iteration

We ran six chains, among them: chain1, chain2 and chain3 correspond to the uncollapsed sampler (we use U stand for it), chain4, chain5 and chain6 correspond to the collapsed sampler (we use C stand for it). Among them, chain1 and chain4 start with \(a = b = 1,e = 4,f = 1/HN\), \(\alpha =0.1543\); chain2 and chain5 start with \(a = b = 1,\,e = 14,\,f = 1/HN\), \(\alpha =1.7658\), i.e., the ground truth of all parameters; chain3 and chain6 start with \(a = b = 1,\,e = 24,f = 1/HN\), \(\alpha =3.7991\). We ran each chain \(maxIter=10,000\) MCMC iterations, throw \(burnin=3000\) samples and collected the last 7000 samples. We illustrate occurring times of all the \(K_s\) values sampled from the six chains in Table 1.

Table 1 Occurring times of all the \(K_s\) values sampled from six chains

As depicted in Table 1 and Fig. 4, in all six chains, mode of \(K_s\) is 8, which is as same as the ground truth we have known. Thus, we conclude that all the six chains converge to true posterior distribution over Z. Apparently, the inference is biased w.r.t. different settings. Values of \(K_s\) span from 4 to 15, chain1 and chain2 have smaller dispersion on \(K_s\) value than chain4, chain5 and chain6. From this perspective, we can draw a conclusion that uncollapsed samplers get better inference results than collapsed samplers. We also can see that when alpha takes a small value, samples with \(K_s=7\) are more than samples with \(K_s=9\), when alpha takes a bigger value, the number of samples with \(K_s=9\) become larger, i.e., the setting has big affect to the statistical dispersion on K.

Fig. 4
figure 4

Histograms of retained \(K_s\) correspond to the six chains

For structured parameters such as \(Z_{ik}\)s, the common practice to summarize it is to take the modulus of \(K_s\) as the K value and take the last sample as Z. Apparently, the chain1 did not has a good discrimination degree, because the number of samples with \(K_s=7\) and \(K_s=8\) are almost equal. So, we use the 6997th sample which was drawn from the chain2 as our posterior inference result. See Fig. 5 for the programm running results.

Fig. 5
figure 5

Programm running results

The inferred Z was depicted in Fig. 3c. We compare the posterior inference results with the ground truth and the 1000th sample (which was depicted in Fig. 3d) via illustrate their communities in Table 2. The second row of Table 2 records the true communities, we can see that C1, C2,…, C8 are subset of \(V_t\), and Ci \(\bigcap \) Cj \(\ne \emptyset , \forall i,j=1,2,\ldots ,8\), i.e., C1, C2,…, C8 are overlapping communities.

Table 2 The true communities and the inferred communities

From Table 2, we can see that the biggest two communities C1 and C2 have the same objects in both the ground truth and the inferred results, but those small communities are different from each other. Only 6 objects \(v_{11},v_{18},v_{19},v_{22},v_{27},v_{29}\) have the same community affiliation, imply that for an unsupervised learning task, such as overlapping community detection, even if we known the ground truth, it is hard to obtain accuracy results via statistical machine learning method. Let us see Z sampled from chain2 in the 1000th MCMC iteration, it is very far from the ground truth, so it’s necessary to throw the burning samples away.

Compare the histogram of \(\alpha \) (middle in Fig. 6) and the histogram of \(\lambda \) (right in Fig. 6) correspond to the chain2, we found that the change range of \(\alpha \) is larger, while that of \(\lambda \) is smaller.

Fig. 6
figure 6

Histogram of retained \(K_s\), \(\alpha \), \(\lambda \) which were drawn from chain2

This conclusion can also be verified according to metrics depicted in Fig. 7, we can see that the HPD of \(\lambda \) (Fig. 7a) is more short of the HPD of \(\alpha \) (Fig. 7b). The HPD is the minimum width Bayesian credible interval, it is the shortest interval containing a given portion of the probability density. One of the most commonly used is the \(95\%\) HPD or \(98\%\) HPD, often accompanied by the \(50\%\) HPD.

Fig. 7
figure 7

HPD of retained \(\alpha \), \(\lambda \) which were drawn from chain2

In Fig. 7, the black curve describes the posterior using a kernel density estimation, mode, ROPE means lower and upper values of the region of practical equivalence. When we say that the \(95\%\) HPD for \(\alpha \) is 1.33, 4.78, we mean that according to our data and model we think \(\alpha \) in question is between 1.33 and 4.78 with a 0.95 probability. \(95\%\)HPD of retained \(\alpha \), \(\lambda \) which were drawn from chain1 and chain3 were depicted in Fig. 8.

We summarize mode \(95\%\)HPD of retained \(\alpha \), \(\lambda \) which were drawn from all three chains in Table 3 and we can draw a conclusion that setting has setting has small affect to the statistical dispersion on alpha and lambda.

Table 3 Summarization about mode and 95% HPD of retained \(\alpha \), \(\lambda \)

Experiment on the LESMIS network

Most of the existing benchmark data sets do not produce good results in our experiments. One reason is that most of the available network data are binary networks. Another reason is that a large number of count value networks are overdisperse; thus, the Poisson likelihood is not a good choice for modeling. Although the negative binomial likelihood is more suitable for these overdisperse count value data, the inference of the rILFM model which has a negative binomial likelihood, is very sensitive to the start position and, thus requires great care in selecting appropriate starting point. At present, we are still working on this method.

Fig. 8
figure 8

\(95\%\)HPD of retained \(\alpha \), \(\lambda \) which were drawn from chain1 and chain3. a, c Correspond to chain1, b, d Correspond to chain3

The LESMIS network is patchy at best. This network is included in the collection of Miscellaneous Networks, and describes the coappearance of characters in Les Miserables by Victor Hugocontain. The undirected weighted graph contains 77 nodes and 254 edges, and its density is 0.0868079; maximum degree is 36; average degree is 6; assortativity is − 0.165225; number of triangles is 1.4K; average number of triangles is 18; maximum number of triangles is 82; average clustering coefficient is 0.573137; fraction of closed triangles is 0.498932; lower bound of maximum clique is 10.more information is provide in [26]. As depicted in Fig. 9, visualization of the LESMIS network was obtained via interactive graph visualization platform provided by the networkrepository.com [26].

Fig. 9
figure 9

Visualization of the LESMIS network

We obtain a data file in GML format, we convert it into a CSV file. The file contains an upper triangular matrix, with all diagonal elements as 0. Note that we have no ground truth about Z and K. For greater reliability, we ran two chains: chain1, which starts with \(a = b = 1,\,e = 24,\,f = 1/HN\); and chain2, which starts with \(a = b = 1,\,e = 44,\,f = 1/HN\). We ran each chain for \(maxIter=10000\) MCMC iterations, with \(burnin=4000\) and collected the last 6000 samples.

As shown in Fig. 10a, b, both of the two chains show mixing. We illustrate occurring times of all the

Fig. 10
figure 10

Trajectory of sampled \(K_s\), a depicted the trajectory of \(K_s\) sampled from chain1, b depicted the trajectory of \(K_s\) sampled from chain2

\(K_s\) values sampled from the two chains in Table 4. We can see that for both of two chains, the mode of all \(K_s\)s is 15. Thus, our potential true K value of \(G_o\) is \(K_o=15\).

Table 4 Occurring times of all the \(K_s\) values sampled from the two chains

From Fig. 11, we can see that the 5962th sample is the last sample drawn from chain1 which satisfied \(K_s=15\). So, we chose this sample as Z’s posterior inference result, i.e. the observed graph \(G_o\)’s community assignment matrix is sampled at the 5962th iteration.

Fig. 11
figure 11

\(K_s\) value of the last 40 samples drawn from chain1

Figure 12 ac depict the histogram of \(K_s\), \(\alpha \) and \(\lambda \) for the samples retained from chain1, d−f correspond to that of chain2. We can find that although the starting positions of the two chains are different, posterior distribution of the parameters inferred via MCMC are very approximate to each other.

Fig. 12
figure 12

Histogram of retained \(K_s\), \(\alpha \), \(\lambda \), ac correspond to samples retained from chain1, df correspond to that of chain2

Figure 13a, b depict the HPD of \(\alpha \) and \(\lambda \) for the samples retained from chain1, cd correspond to that of chain2. We can find that for chain1: the \(95\%\) HPD for \(\alpha \) is [1.07, 3.27] and its mode is 1.98; the \(95\%\) HPD for \(\lambda \) is \([0.85-1.01]\) and its mode is 0.94. For chain2: the \(95\%\) HPD for \(\alpha \) is [1.19, 3.41] and its mode is 2.14; the \(95\%\) HPD for \(\lambda \) is \([0.87-1.01]\) and its mode is 0.93. From this perspective, the two chains have approximate inference quality on single variable parameters. But is we compare dispersion of \(K_s\), we will find that inference quality of chain1 is better than chain2.

Fig. 13
figure 13

HPD of retained \(\alpha \), \(\lambda \), a, b correspond to samples retained from chain1, c, d correspond to that of chain2

Conclusion

The paper makes the following contributions: (1) we propose a generative model for count-value networks with overlapping community structure; (2) we use the IBP to model the community assignment matrix Z, so the number of communities K is not required to be fixed in advance, it is able to increase as more and more data are encountered; (3) both uncollapsed Gibbs sampler and collapsed Gibbs sampler for the generative model have been derived; (4) we analysis the inference quality on single variable parameters; (5) we conduct extensive experiments on simulated network data and real network data, we find that the proposed model and inference procedure can bring us the desired experimental results.

Most count value networks are overdisperse, the negative binomial likelihood is more suitable for these overdisperse count value data. But inference of the rILFM model with negative binomial likelihood requires great care in selecting appropriate starting point, we aim it as one of our future work.

For single variable parameters, the posterior inference result is easy to communicate. But for structured parameters such as \(Z_{ik}\hbox {s}\), how to summarize the posterior inference results and estimate the inference quality, is a considerable challenge, we aim it as another one of our future work.