Abstract
Estimating the number of triangles in a graph is one of the most fundamental problems in sublinear algorithms. In this work, we provide an algorithm that approximately counts the number of triangles in a graph using only polylogarithmic queries when the number of triangles on any edge in the graph is polylogarithmically bounded. Our query oracle Tripartite Independent Set (TIS) takes three disjoint sets of vertices A, B and C as inputs, and answers whether there exists a triangle having one endpoint in each of these three sets. Our query model generally belongs to the class of group queries (Ron and Tsur ACM Trans. Comput. Theory 8(4), 15, 2016; Dell and Lapinskas 2018) and in particular is inspired by the Bipartite Independent Set (BIS) query oracle of Beame et al. (2018). We extend the algorithmic framework of Beame et al., with TIS replacing BIS, for approximately counting triangles in graphs.
Similar content being viewed by others
Notes
See http://www.wisdom.weizmann.ac.il/~oded/MC/237.html for a comment on BIS.
\(\widetilde {\mathcal {O}}(\cdot )\) hides a polynomial factor of \(\log n\) and \(\frac {1}{\epsilon }\), where 𝜖 ∈ (0, 1) is such that \((1-\epsilon ) t \leq \hat {t} \leq (1+\epsilon )t\); \(\hat {t}\) and t denote the estimated and actual number of triangles in G, respectively.
High probability means that the probability of success is at least \(1-\frac {1}{n^{c}}\) for some constant c.
The threshold is a fixed polynomial in \(d, \log n\) and \(\frac {1}{\epsilon }\).
Large refers to a fixed polynomial in \(d, \log n\) and \(\frac {1}{\epsilon }\)
In our algorithm, k is a constant. However, Lemma 7 holds for any \(k \in \mathbb {N}\).
For the exact statement of the Importance Sampling Lemma see Lemma 29 in Appendix ??.
Polylogarithmic refers to a polynomial in \(d, \log n\) and \(\frac {1}{\epsilon }\)
Note that Δt is the number of triangles having t as one of its vertices and we are not assuming any bound on Δt. We assume ΔE, that is number of triangles on any edge, is bounded.
The constant 10 is arbitrary. Any absolute constant more than 1 would have been good enough.
The constant in \(\mathcal {O}_{c}(\cdot )\) is a function of c. The result of Bhattacharya et al. is a high probability result. The exact bound in the paper of Dell et al. is \({\mathcal {O}}_{c} \left (\epsilon ^{-2}{\log ^{4c+7} n} \log \frac {1}{\delta } \right )\),x where the probability of success of their algorithm is 1 − δ.
References
Ahmed, N. K., Duffield, N., Neville, J., Kompella, R.: Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 1446–1455 (2014)
Ahn, K. J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp 5–14 (2012)
Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)
Bhattacharya, A., Bishnu, A., Ghosh, A., Mishra, G.: Hyperedge estimation using polylogarithmic subset queries. arXiv:1908.04196(2019)
Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp 253–262 (2006)
Bishnu, A., Ghosh, A., Kolay, S., Mishra, G., Saurabh, S.: Parameterized query complexity of hitting set using stability of sunflowers. In: Proceedings of the 29th International Symposium on Algorithms and Computation, ISAAC, pp 25:1–25:12 (2018)
Beame, P., Har-Peled, S., Ramamoorthy, S. N., Rashtchian, C., Sinha, M.: Edge estimation with independent set oracles. In: Proceedings of the 9th Innovations in Theoretical Computer Science Conference, ITCS, pp 38:1–38:21 (2018)
Björklund, A., Pagh, R., Williams, V. V., Zwick, U.: Listing triangles. In: Proceedings of the 41st International Colloquium on Automata, Languages and Programming, ICALP, pp 223–234 (2014)
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete algorithms, SODA, pp 623–632 (2002)
Cormode, G., Jowhari, H.: A second look at counting triangles in graph streams (corrected). Theor. Comput. Sci. 683, 22–30 (2017)
Choi, S. -S., Kim, J. H.: Optimal query complexity bounds for finding graphs. Artif. Intell. 174(9–10), 551–569 (2010)
Dell, H., Lapinskas, J.: Fine-grained reductions from approximate counting to decision. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pp 281–288 (2018)
Dell, H., Lapinskas, J., Meeks, K.: Approximately counting and sampling small witnesses using a colourful decision oracle. In: Chawla, S. (ed.) Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, SODA, pp 2201–2211 (2020)
Dubhashi, D. P., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge (2009)
Eden, T., Levi, A., Ron, D., Seshadhri, C.: Approximately counting triangles in sublinear time. SIAM J. Comput. 46(5), 1603–1646 (2017)
Eden, T., Ron, D., Seshadhri, C.: On approximating the number of k-cliques in sublinear time. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pp 722–734 (2018)
Feige, U.: On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM J. Comput. 35(4), 964–984 (2006)
Goldreich, O., Ron, D.: Approximating average parameters of graphs. Random Struct. Algorithms 32(4), 473–493 (2008)
Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear-time. SIAM J. Discrete Math. 25(3), 1365–1411 (2011)
Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978)
Janson, S.: Large deviations for sums of partly dependent random variables. Random Struct. Algorithms 24(3), 234–248 (2004)
Jowhari, H., Ghodsi, M.: New streaming algorithms for counting triangles in graphs. In: Proceedings of the International Computing and Combinatorics Conference, pp 722–734 (2005)
Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 589–597 (2013)
Kane, D. M., Mehlhorn, K., Sauerwald, T., Sun, H.: Counting arbitrary subgraphs in data streams. In: Proceedings of the 39th International Colloquium on Automata, Languages and Programming, ICALP, pp 598–609 (2012)
Kallaugher, J., Price, E.: A hybrid sampling scheme for triangle counting. In: Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp 1778–1797 (2017)
Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.-L.: Counting and sampling triangles from a graph stream. Proc. VLDB Endow. 6(14), 1870–1881 (2013)
Rubinstein, A., Schramm, T., Weinberg, S. M.: Computing exact minimum cuts without knowing the graph. In: Proceedings of the 9th Innovations in Theoretical Computer Science Conference, ITCS, pp 39:1–39:16 (2018)
Ron, D., Tsur, G.: The power of an example: hidden set size approximation using group queries and conditional sampling. ACM Trans. Comput. Theory 8(4), 15 (2016)
Stockmeyer, L.: The complexity of approximate counting. In: Proceedings of the 15th Annual ACM Symposium on Theory of Computing, STOC, pp 118–126 (1983)
Stockmeyer, L.: On approximation algorithms for #P. SIAM J. Comput. 14(4), 849–861 (1985)
Tangwongsan, K., Pavan, A., Tirthapura, S.: Parallel triangle counting in massive streaming graphs. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM, pp 781–786 (2013)
Acknowledgments
Anup Bhattacharya is supported by NPDF fellowship (No. PDF/2018/002072), Government of India. Arijit Ghosh is supported by Ramanujan Fellowship (No. SB/S2/RJN-064/2015), India. We thank the anonymous reviewers of ISAAC 2019 and TOCS to improve the results and presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper has been accepted in ISAAC’19
Appendices
Appendix : A: Scenario Where ΔE is Bounded
In this Section, we discuss some scenarios where the number of triangles sharing an edge is bounded. An obvious example for such graphs are graphs with bounded degree. We explore some other scenarios.
-
(i)
Consider a graph G(P,E) such that the vertex set P corresponds to a subset of \(\mathbb {R}^{2}\) and (u,v) ∈ E if and only if the distance between u and v is exactly 1. The objective is to compute the number of triples of points from P forming an equilateral triangle having side length 1, that is, the number of triangles in G. Observe that there can be at most two triangles sharing an edge in G, that is, ΔE ≤ 2.
-
(ii)
Consider a graph G(P,E) such that the vertex set P corresponds to a set of points inside an N × N square in \(\mathbb {R}^{2}\) and (u,v) ∈ E if and only if the distance between u and v is at most 1. The objective is to compute the number of triples of points from P forming a triangle having each side length at most 1, that is, the number of triangles in G. For large enough N there can be bounded number of triangles sharing an edge in G with high probability.
-
(iii)
Consider a graph G(V,E) representing a community sharing information. Each node has some information and two nodes are connected if and only if there exists an edge between the nodes. Nodes increase their information by sharing information among their neighbors in G. Observe that the information of a node is derived by the set of neighbors. So, if two nodes have large number of common neighbors in G, then there is no need of an edge between the two nodes. So, the number of triangles on any edge in the graph is bounded. The objective is to compute the number of triangles in G, that is, the number of triples of nodes in G such that each pair of vertices are connected.
In (i) and (ii), TIS oracle can be implemented very efficiently. We can report a TIS query by just running a standard plane sweep algorithm in Computational Geometry that takes \(\mathcal {O}(n \log n)\) running time.
Appendix : B: Some Probability Results
Proposition 23
Let X be a random variable. Then \(\mathbb {E}[X] \leq \sqrt {\mathbb {E}[X^{2}]}\).
Lemma 24
([14, Theorem 7.1]). Let f be a function of n random variables X1,…,Xn such that
-
(i)
Each Xi takes values from a set Ai,
-
(ii)
\(\mathbb {E}[f]\) is bounded, i.e., \(0 \leq \mathbb {E}[f] \leq M\),
-
(iii)
\({\mathscr{B}}\) be any event satisfying the following for each i ∈ [n].
$$ \left| \mathbb{E}[f ~|~X_{1},\dots,X_{i-1},X_{i}=a_{i},\mathcal{B}^{c}] - \mathbb{E}[f ~|~X_{1},\dots,X_{i-1},X_{i}=a^{\prime}_{i},\mathcal{B}^{c}] \right|\leq c_{i}.$$
Then for any δ ≥ 0,
Lemma 25 (Hoeffding’s inequality 14)
Let X1,…,Xn be n independent random variables such that Xi ∈ [ai,bi]. Then for \(X=\sum \limits _{i=1}^{n} X_{i}\), the following is true for any δ > 0.
Lemma 26 (Chernoff-Hoeffding bound 14)
Let X1,…,Xn be independent random variables such that Xi ∈ [0, 1]. For \(X=\sum \limits _{i=1}^{n} X_{i}\) and \(\mu _{l} \leq \mathbb {E}[X] \leq \mu _{h}\), the followings hold for any δ > 0.
-
(i)
\({\mathbb {P}} \left (X > \mu _{h} + \delta \right ) \leq e^{-2\delta ^{2}/n}\).
-
(ii)
\({\mathbb {P}} \left (X < \mu _{l} - \delta \right ) \leq e^{-2\delta ^{2} / n}\).
Lemma 27
[14, Theorem 3.2] Let X1,…,Xn be random variables such that ai ≤ Xi ≤ bi and \(X=\sum \limits _{i=1}^{n} X_{i}\). Let \(\mathcal {D}\) be the dependent graph, where \(V(\mathcal {D})=\{X_{1},\ldots ,X_{n}\}\) and \( E(\mathcal {D})= \{(X_{i},X_{j}): X_i \text {and} X_j \text {are dependent}\}\). Then for any δ > 0,
where \(\chi ^{*}(\mathcal {D})\) denotes the fractional chromatic number of \(\mathcal {D}\).
The following lemma directly follows from Lemma 27.
Lemma 28
Let X1,…,Xn be indicator random variables such that there are at most d many Xj’s on which an Xi depends and \(X=\sum \limits _{i=1}^{n} X_{i}\). Then for any δ > 0,
Lemma 29 (Importance sampling 7)
Let (D1,w1,e1),…, (Dr,wr,er) are the given structures and each Di has an associated weight c(Di) satisfying
-
(i)
wi,ei ≥ 1,∀i ∈ [r];
-
(ii)
\(\frac {e_{i}}{\rho } \leq c(D_{i}) \leq e_{i} \rho \) for some ρ > 0 and all i ∈ [r]; and
-
(iii)
\(\sum \limits _{i=1}^{r} {w_{i}\cdot c(D_{i})} \leq M\).
Note that the exact values c(Di)’s are not known to us. Then there exists an algorithm that finds \((D^{\prime }_{1},w^{\prime }_{1},e^{\prime }_{1}),\ldots , (D^{\prime }_{s},w^{\prime }_{s},e^{\prime }_{s})\) such that, with probability at least 1 − δ, all of the above three conditions hold and
where \(S=\sum \limits _{i=1}^{r} {w_{i}\cdot c(D_{i})}\) and λ,δ > 0. The time complexity of the algorithm is \(\mathcal {O}(r)\) and \(s=\mathcal {O}\left (\frac {\rho ^{4} \log M \left (\log \log M + \log \frac {1}{\delta }\right )}{\lambda ^{2}}\right )\).
Rights and permissions
About this article
Cite this article
Bhattacharya, A., Bishnu, A., Ghosh, A. et al. On Triangle Estimation Using Tripartite Independent Set Queries. Theory Comput Syst 65, 1165–1192 (2021). https://doi.org/10.1007/s00224-021-10043-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-021-10043-y