Sampling methods and estimation of triangle count distributions in large networks

Nelson Antunes; Tianjian Guo; Vladas Pipiras

doi:10.1017/nws.2021.2

Sampling methods and estimation of triangle count distributions in large networks

Published online by Cambridge University Press: 26 February 2021

Nelson Antunes

Tianjian Guo and

Vladas Pipiras

Show author details

Nelson Antunes*: Affiliation:
Center for Computational and Stochastic Mathematics, University of Lisbon, Avenida Rovisco Pais 1049-001, Lisbon, Portugal University of Algarve, Faro, Portugal
Tianjian Guo: Affiliation:
Department of Statistics and Operations Research, University of North Carolina, CB 3260, Chapel Hill, NC 27599, USA (e-mails: Tianjian.Guo@mccombs.utexas.edu, pipiras@email.unc.edu)
Vladas Pipiras: Affiliation:
Department of Statistics and Operations Research, University of North Carolina, CB 3260, Chapel Hill, NC 27599, USA (e-mails: Tianjian.Guo@mccombs.utexas.edu, pipiras@email.unc.edu)
*: *Corresponding author. Email: nantunes@ualg.pt

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This paper investigates the distributions of triangle counts per vertex and edge, as a means for network description, analysis, model building, and other tasks. The main interest is in estimating these distributions through sampling, especially for large networks. A novel sampling method tailored for the estimation analysis is proposed, with three sampling designs motivated by several network access scenarios. An estimation method based on inversion and an asymptotic method are developed to recover the entire distribution. A single method to estimate the distribution using multiple samples is also considered. Algorithms are presented to sample the network under the various access scenarios. Finally, the estimation methods on synthetic and real-world networks are evaluated in a data study.

Keywords

triangles random sampling distribution estimation inversion approach asymptotic approach multiple samples static and streaming graphs power laws

Type: Research Article
Information: Network Science , Volume 9 , Special Issue S1: Complex Networks 2019 , October 2021 , pp. S134 - S156

DOI: https://doi.org/10.1017/nws.2021.2 [Opens in a new window]
Copyright: © The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Action Editor: Hocine Cherifi

References

Al Hasan, M., & Dave, V. S. (2018). Triangle counting in large networks: A review. WIREs Data Mining Knowledge Discovery, 8(2), e1226.CrossRef Google Scholar

Antunes, N., Guo, T., & Pipiras, V. (2020). Induced edge samplings and triangle count distributions in large networks. In Cherifi, H., Gaito, S., Mendes, J. F., Moro, E., & Rocha, L. M. (Eds.), Complex networks and their applications VIII (pp. 203–215). Springer International Publishing.CrossRef Google Scholar

Antunes, N., & Pipiras, V. (2016). Estimation of flow distributions from sampled traffic. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 1(3), 11:1–11:28.CrossRef Google Scholar

Bar-Yossef, Z., Kumar, R., & Sivakumar, D. (2002). Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM SODA (pp. 623–632).Google Scholar

Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., & Leonardi, S. (2008). Link analysis for web spam detection. ACM Transactions on the Web, 2(1), 2:1–2:42.CrossRef Google Scholar

Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., & Sohler, C. (2006). Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART PODS (pp. 253–262).CrossRef Google Scholar

Eckmann, J., & Moses, E. (2002). Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences of the United States of America, 99(9), 5825–5829.CrossRef Google Scholar

Eldar, Y. C. (2009). Generalized SURE for exponential families: Applications to regularization. IEEE Transactions on Signal Processing, 57(2), 471–481.CrossRef Google Scholar

Jha, M., Seshadhri, C., & Pinar, A. (2015). A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Transactions on Knowledge Discovery from Data, 9(3), 15:1–15:21.CrossRef Google Scholar

Katzir, L., Liberty, E., & Somekh, O. (2011). Estimating sizes of social networks via biased sampling. In WWW’11. ACM.CrossRef Google Scholar

Kolaczyk, E. D. (2009). Statistical analysis of network data. New York: Springer-Verlag.CrossRef Google Scholar

Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’06 (pp. 631–636).CrossRef Google Scholar

Lim, Y., Jung, M., & Kang, U. (2018). Memory-efficient and accurate sampling for counting local triangles in graph streams: From simple to multigraphs. ACM Transactions on Knowledge Discovery from Data, 12(1), 4:1–4:28.CrossRef Google Scholar

Mohaisen, A., Luo, P., Li, Y., Kim, Y., & Zhang, Z. (2012). Measuring bias in the mixing time of social graphs due to graph sampling. In IEEE Military Communications Conference, MILCOM 2012 (pp. 1–6).CrossRef Google Scholar

Newman, M. (2018). Networks: An introduction (2nd ed.). New York: Oxford University Press.CrossRef Google Scholar

Palla, G., Derényi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.CrossRef Google Scholar PubMed

Stefani, L. D., Epasto, A., Riondato, M., & Upfal, E. (2017). TRIÈST: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Transactions on Knowledge Discovery from Data, 11(4), 43:1–43:50.CrossRef Google Scholar

Thompson, S. K. (2012). Sampling (3rd ed.). Wiley Series in Probability and Statistics. Hoboken, NJ: John Wiley & Sons, Inc.CrossRef Google Scholar PubMed

Tillé, Y. (2006). Sampling algorithms. Springer Series in Statistics. New York: Springer.Google Scholar

Tune, P., & Veitch, D. (2011). Fisher information in flow size distribution estimation. IEEE Transactions on Information Theory, 57(10), 7011–7035.CrossRef Google Scholar

Vitter, J. S. (1985). Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1), 37–57.CrossRef Google Scholar

Zhang, Y., Kolaczyk, E. D., & Spencer, B. D. (2015). Estimating network degree distributions under sampling: an inverse problem, with applications to monitoring social media networks. The Annals of Applied Statistics, 9(1), 166–199.CrossRef Google Scholar

Article contents

Sampling methods and estimation of triangle count distributions in large networks

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests