Abstract
Motivated by the complexity of network data, we propose a directed hybrid random network that mixes preferential attachment (PA) rules with uniform attachment rules. When a new edge is created, with probability \(p\in (0,1)\), it follows the PA rule. Otherwise, this new edge is added between two uniformly chosen nodes. Such mixture makes the in- and out-degrees of a fixed node grow at a slower rate, compared to the pure PA case, thus leading to lighter distributional tails. For estimation and inference, we develop two numerical methods which are applied to both synthetic and real network data. We see that with extra flexibility given by the parameter p, the hybrid random network provides a better fit to real-world scenarios, where lighter tails from in- and out-degrees are observed.
Similar content being viewed by others
References
Alves, C., Ribeiro, R., Sanchis, R. (2019). Preferential attachment random graphs with edge-step functions. Journal of Theoretical Probability, 34(1), 438–476.
Atalay, E., Hortaçsu, A., Roberts, J., Syverson, C. (2011). Network structure of production. Proceedings of the National Academy of Sciences of the United States of America, 108(13), 5199–5202.
Barabási, A.-L., Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Chen, M.-H., Shao, Q.-M., Ibrahim, J. G. (2010). Monte Carlo Methods in Bayesian Computation. New York, NY: Springer-Verlag.
Cooper, C., Frieze, A. (2003). A general model of web graphs. Random Structures and Algorithms, 22(3), 311–335.
Csardi, G., Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems, 1695.
Deijfen, M., van den Esker, H., van der Hofstad, R., Hooghiemstra, G. (2009). A preferential attachment model with random initial degrees. Arkiv för Matematik, 47(1), 41–72.
Deijfen, M., van den Esker, H., van der Hofstad, R., Hooghiemstra, G. (2020). A preferential attachment model with random initial degrees. https://arxiv.org/pdf/0705.4151.pdf
de Sollar Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Durrett, R. T. (2006). Random Graph Dynamics. Cambridge, U.K.: Cambridge University Press.
Durrett, R. T. (2019). Probability: Theory and Examples (5 ed.). Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, U.K.: Cambridge University Press.
Gao, F., van der Vaart, A. (2017). On the asymptotic normality of estimating the affine preferential attachment network models with random initial degrees. Stochastic Processes and their Applications, 127(11), 3754–3775.
Gelman, A., Carlin, J. B., Dunson, D. B., Behtari, A., Rubin, D. B. (2013). Bayesian Data Analysis. Boca Raton, FL, U.S.A.: Chapman and Hall/CRC.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109.
Henzinger, M., Lawrence, S. (2004). Extracting knowledge from the World Wide Web. Proceedings of the National Academy of Sciences of the United States of America, 101(supplement 1), 5186–5191.
Hunter, D. R., Goodreau, S. M., Handcock, M. S. (2008). Goodness of fit of social network models. Journal of the American Statistical Association, 103(481), 248–258.
Lagarias, J. C., Reeds, J. A., Wright, M. H., Wright, P. E. (1998). Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM Journal on Optimization, 9(1), 112–147.
Liang, F., Liu, C., Carroll, R. J. (2010). Advanced Markov Chain Monte Carlo Methods: Learning from Past Examples. Hoboken, NJ, U.S.A.: Wiley.
Mahmoud, H. M. (2019). Local and global degree profiles of randomly grown self-similar hooking networks under uniform and preferential attachment. Advances in Applied Mathematics, 111, 101930.
Medina, J. A., Finke, J., Rocha, C. (2019). Estimating formation mechanisms and degree distributions in mixed attachment networks. Journal of Physica A: Mathematical and Theoretical, 52, 095001.
Mengersen, K. L., Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. Annals of Statistics, 24(1), 101–121.
Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 1087.
Nash, J. C. (2014). On best practice optimization methods in R. Journal of Statistical Software, 60(2), 1–14.
Nelder, J. A., Mead, R. (1965). A simple method for function minimization. The Computer Journal, 7(4), 308–313.
Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 65(1), 025102.
Pachon, A., Sacerdote, L., Yang, S. (2018). Scale-free behavior of networks with the copresence of preferntial and uniform attachment rules. Physica D: Nonliner Phenomena, 371, 1–12.
Samorodnitsky, G., Resnick, S., Towsley, D., Davis, R., Willis, A., Wan, P. (2016). Nonstandard regular variation of in-degree and out-degree in the preferential attachment model. Journal of Applied Probability, 53(1), 146–161.
Shao, Z.-G., Zou, X.-W., Jin, Z.-Z. (2006). Growing networks withmixed attachment mechanisms. Journal of Physics A: Mathematical and General, 39, 9.
Smith, B. J. (2007). boa: An R package for MCMC output convergence assessment and posterior inference. Journal of Sstatistical Software, 21(11), 1–37.
van der Hofstad, R. (2017). Random Graphs and Complex Networks. Cambridge, U.K.: Cambridge University Press.
Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P. (2009, August). On the evolution of user interaction in Facebook. In J. Crowcroft, & B. Krishnamurthy (Eds.), Proceedings of the 2nd ACM Workshop on Online Social Networks (WOSN’09), New York, NY, U.S.A. (pp. 37–42). Association for Computing Machinery.
Wan, P., Wang, T., Davis, R. A., Resnick, S. I. (2017). Fitting the linear preferential attachment model. Electronic Journal of Statistics, 11(2), 3738–3780.
Wang, T., Resnick, S. (2018). Multivariate regular variation of discrete mass functions with applications to preferential attachment networks. Methodology and Computing in Applied Probability, 20(3), 1029–1042.
Wang, T., Resnick, S. (2020). Degree growth rates and index estimation in a directed preferential attachment model. Stochastic Processes and their Applications, 130(2), 878–906.
Wang, T., Resnick, S. I. (2015). Asymptotic normality of in- and out-degree counts in a preferential attachment model. Stochastic Models, 33(2), 229–255.
Wang, T., Resnick, S. I. (2020). A directed preferential attachment model with Poisson measurement. https://arxiv.org/pdf/2008.07005.pdf.
Wang, T., Resnick, S. I. (2021). Common growth patterns for regional social networks: A point process approach. Journal of Data Science. https://doi.org/10.6339/21-JDS1021.
Zhang, P., Mahmoud, H. M. (2020). On nodes of small degrees and degree profile in preferential dynamic attachment circuits. Methodology and Computing in Applied Probability, 22(2), 625–645.
Acknowledgements
We would like to thank two anonymous referees and the handling AE for constructive reports that help improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
A. Proof of Theorem 2
Analogous to the previous proofs, we present the major steps of the proof for in-degree. To show the convergence of \(\frac{N^\text {in}_m(n)}{n}\), we take two steps. The first is to prove the concentration of \(\frac{N^\text {in}_m(n)}{n}\) around \(\mathbb {E}\left( N^\text {in}_m(n)\right) /{n}\), then it suffices to find the asymptotic limit of \(\mathbb {E}\left( N^\text {in}_m(n)\right) /{n}\).
Note that when \(\beta =0\), then the number of nodes in graph G(n) is deterministic, so the concentration results in van der Hofstad (2017), Proposition 8.4 are applicable, and we have for \(C>2\sqrt{2}\),
When \(\beta >0\), the total number of nodes in graph G(n) is random, and detailed proofs are needed. We claim that for \(\beta >0\), there exists some constant \(C>2\sqrt{2}\) such that.
The proof of (12) relies on rewriting \(N^{\mathrm{in}}_m(n)-\mathbb {E}(N^{\mathrm{in}}_m(n))\) in terms of a Doob’s martingale, similar to the argument in the corrected version of Deijfen et al. (2009) (available at https://arxiv.org/pdf/0705.4151.pdf, and cited as Deijfen et al. (2020). But here since the number of nodes created at each step is random, we need to modify the proof machinery outlined in Deijfen et al. (2020). Recall the notation in Sect. 2 that \(\{J_n:n\ge 1\}\) is a sequence of iid tri-nomial random variable on \(\{1,2,3\}\) with cell probability \(\alpha \), \(\beta \) and \(\gamma \), respectively. Write \(\{J_k:1\le k\le n\}=:J_{[n]}\), and for \(1\le t\le n\), define
and \(Z_0 = \mathbb {E}\left[ N^{\mathrm{in}}_m(n)\right] \). Then
and \(\{Z_n:n\ge 0\}\) is a martingale with \(\mathbb {E}(|Z_t|) = \mathbb {E}(N^{\mathrm{in}}_m(n)) \le n\).
Then consider
For I, the only change in the conditioning is the extra information contained in G(t), which, compared with that in \(G(t-1)\), specifies how the edge created at the t-th -step is constructed. This has the potential to affect the in-degrees of at most 2 nodes, thus leading to \(|I|\le 2\).
For the second term, II, we define \(\bar{J}_t\) to be an independent copy of \(J_t\), which is also independent from \(J_{[n]}\). Write \(\bar{J}_{[n]} :=\{J_1,\ldots , J_{t-1},\bar{J}_t, J_{t+1},\ldots , J_n\}\). Let \(\bar{N}^\mathrm{in}_m(n)\) and \(\bar{D}^\mathrm{in}_v(n)\) be the number of nodes with in-degree m, and the in-degree of node v in the hybrid PA graph, \(\bar{G}(n)=(\bar{V}_n, \bar{E}_n)\), constructed from \(\bar{J}_{[n]}\), respectively. Then we have
Therefore, it suffices to consider
where potential differences will occur only if \(J_n\ne \bar{J}_n\).
We start by assuming that \(J_n,\bar{J}_n\in \{1,3\}\), i.e. the total numbers of nodes in the two graphs remain unchanged. Then the quantity in (13) is bounded above by
When we either have \(J_t=1, \bar{J}_t=3\) or \(J_t=3, \bar{J}_t=1\), there are at most 2 nodes whose in-degrees will be different. Therefore, when \(J_t,\bar{J}_t\in \{1,3\}\),
If \((J_t,\bar{J}_t)\in \{(2,1), (2,3), (1,2), (3,2)\}\), then \(\bigl ||V_s|-|\bar{V}_s|\bigr |=1\), for all \(s\ge n\), and we need to consider nodes created before and after step t separately. In particular, the difference in the total number of nodes will also lead to different attachment probabilities in the two graphs. Without loss of generality, we assume \(J_t=2\) and \(\bar{J}_t\ne 2\). For comparison purpose, we will relabel the extra node added at the t-th step as \(t'\), and keep the labeling of the other nodes identical in the two graphs. Then the quantity in (13) is bounded above by
Let N be the first time after t that a new node is created, i.e. \(N:=\inf \{k\ge t+1: J_k\ne 2\}\). Note that for \(s\in \{t+1,\ldots N\}\), every edge that is added at step s and pointing to the node \(t'\) will lead to a potential difference in the in-degree of nodes in \(V_{t-1}\). Hence, apart from node \(t'\), there are at most \(N-t-1\) number of nodes in \(V_{t-1}\) having different in-degrees in the two graphs. If no edge between step \(t+1\) and step N has been pointing to the node \(t'\), then possible differences in the in-degree of one particular node may occur due to the change in the attachment probabilities. This is also the case for those nodes added at N and afterward. To deal with different in-degrees due to changes in the attachment probabilities, we will apply a similar treatment as given in (Deijfen et al. 2020, Eq. (2.17)).
We now rewrite
Therefore, at least one of the attachments to node \(v\in V_n\) needs to have been made for one of the graphs but not the other. Let s denote the first time where such an attachment was made differently in the two graphs. Then we have \(D^{\mathrm{in}}_v(s-1)=\bar{D}^\mathrm{in}_v(s-1)\le m\). Hence,
Since \(\sum _{v\in V_{s-1}} (D^{\mathrm{in}}_v(s-1)+\delta _{\mathrm{in}})= s+\delta _{\mathrm{in}}|V_{s-1}|\), then (14) implies that
Thus, combining all scenarios together gives that
Applying the bound in Eq. (1.2) of the supplement gives that there exists some constant \(C'>0\) such that
which further implies
Then by the Azuma-Hoeffding’s inequality, we have
Then the claim in (12) follows by setting \(b=C\sqrt{n\log n}(1+2/\beta +\log n)\), with \(C>2\sqrt{2}\).
Then we are left with identifying the asymptotic limit of \(\mathbb {E}(N^{\mathrm{in}}_m(n))/n\). Consider the following approximation of the attachment probability:
Recall that
Applying Chernoff bound again gives
for some constant \(C > 0\). Consider a in-degree sequence \(\left\{ \tilde{D}_i^\mathrm{in}(n)\right\} \) from a directed PA network with set of parameters \((\alpha , \beta , \gamma , \tilde{\delta }_\mathrm{in}, \tilde{\delta }_\mathrm{out})\), as studied in Samorodnitsky et al. (2016); Wan et al. (2017). Establish an argument similar to Eq. (1.1) in the supplement as follows:
for some constant \(\tilde{C} > 0\). Note
By the developed Chernoff bounds, we have
Noticing that \(\sum _{k = i}^{n} k^{-3/2} \sqrt{\log {k}} < \infty \) as \(n \rightarrow \infty \), we complete the proof by applying the results derived in Wang and Resnick (2020). \(\square \)
B. Validation of MLE
From the log-likelihood function in (11), we have the following score functions for \(\delta _{\mathrm{in}},\delta _{\mathrm{out}}\) and p, respectively.
We then set the score function (16) to 0. Note that due to the randomness of \(|V_{k-1}|\), the methodology given in Wan et al. (2017) is not directly applicable. Instead, we approximate the score function (16) as follows:
where
Therefore,
Since \(|V_{n-1}|/n\, \overset{a.s.}{\longrightarrow } \,1/(1-\beta )\), then by the Cesàro convergence of random variables, we have \(|R_\text {in}(n)|/n\, \overset{a.s.}{\longrightarrow } \,0\). Then the approximate score equation in (16) becomes
Applying the method in Wan et al. (2017) further yields the following approximate score function:
where \(N^\text {in}_{>m}(n)\) denotes the number of nodes with in-degree strictly greater than m in \(\mathcal {H}_n\).
Similarly, the score equation with respect to (17) can be approximated by
with \(N^\text {out}_{>m}(n)\) being the number of nodes with out-degree strictly greater than m in \(\mathcal {H}_n\). However, with (19) and (20) available, the approximation to the third score equation in (18) leads to a deterministic solution of \(p=1\). This indicates former methods to find MLE as in Wan et al. (2017) are not able to give us the desirable results.
About this article
Cite this article
Wang, T., Zhang, P. Directed hybrid random networks mixing preferential attachment with uniform attachment mechanisms. Ann Inst Stat Math 74, 957–986 (2022). https://doi.org/10.1007/s10463-022-00827-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00827-5