Skip to main content
Log in

On Triangle Estimation Using Tripartite Independent Set Queries

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

Estimating the number of triangles in a graph is one of the most fundamental problems in sublinear algorithms. In this work, we provide an algorithm that approximately counts the number of triangles in a graph using only polylogarithmic queries when the number of triangles on any edge in the graph is polylogarithmically bounded. Our query oracle Tripartite Independent Set (TIS) takes three disjoint sets of vertices A, B and C as inputs, and answers whether there exists a triangle having one endpoint in each of these three sets. Our query model generally belongs to the class of group queries (Ron and Tsur ACM Trans. Comput. Theory 8(4), 15, 2016; Dell and Lapinskas 2018) and in particular is inspired by the Bipartite Independent Set (BIS) query oracle of Beame et al. (2018). We extend the algorithmic framework of Beame et al., with TIS replacing BIS, for approximately counting triangles in graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. See http://www.wisdom.weizmann.ac.il/~oded/MC/237.html for a comment on BIS.

  2. \(\widetilde {\mathcal {O}}(\cdot )\) hides a polynomial factor of \(\log n\) and \(\frac {1}{\epsilon }\), where 𝜖 ∈ (0, 1) is such that \((1-\epsilon ) t \leq \hat {t} \leq (1+\epsilon )t\); \(\hat {t}\) and t denote the estimated and actual number of triangles in G, respectively.

  3. High probability means that the probability of success is at least \(1-\frac {1}{n^{c}}\) for some constant c.

  4. The threshold is a fixed polynomial in \(d, \log n\) and \(\frac {1}{\epsilon }\).

  5. Large refers to a fixed polynomial in \(d, \log n\) and \(\frac {1}{\epsilon }\)

  6. In our algorithm, k is a constant. However, Lemma 7 holds for any \(k \in \mathbb {N}\).

  7. For the exact statement of the Importance Sampling Lemma see Lemma 29 in Appendix ??.

  8. Polylogarithmic refers to a polynomial in \(d, \log n\) and \(\frac {1}{\epsilon }\)

  9. Note that Δt is the number of triangles having t as one of its vertices and we are not assuming any bound on Δt. We assume ΔE, that is number of triangles on any edge, is bounded.

  10. The constant 10 is arbitrary. Any absolute constant more than 1 would have been good enough.

  11. The constant in \(\mathcal {O}_{c}(\cdot )\) is a function of c. The result of Bhattacharya et al. is a high probability result. The exact bound in the paper of Dell et al. is \({\mathcal {O}}_{c} \left (\epsilon ^{-2}{\log ^{4c+7} n} \log \frac {1}{\delta } \right )\),x where the probability of success of their algorithm is 1 − δ.

References

  1. Ahmed, N. K., Duffield, N., Neville, J., Kompella, R.: Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 1446–1455 (2014)

  2. Ahn, K. J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp 5–14 (2012)

  3. Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)

    Article  MathSciNet  Google Scholar 

  4. Bhattacharya, A., Bishnu, A., Ghosh, A., Mishra, G.: Hyperedge estimation using polylogarithmic subset queries. arXiv:1908.04196(2019)

  5. Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp 253–262 (2006)

  6. Bishnu, A., Ghosh, A., Kolay, S., Mishra, G., Saurabh, S.: Parameterized query complexity of hitting set using stability of sunflowers. In: Proceedings of the 29th International Symposium on Algorithms and Computation, ISAAC, pp 25:1–25:12 (2018)

  7. Beame, P., Har-Peled, S., Ramamoorthy, S. N., Rashtchian, C., Sinha, M.: Edge estimation with independent set oracles. In: Proceedings of the 9th Innovations in Theoretical Computer Science Conference, ITCS, pp 38:1–38:21 (2018)

  8. Björklund, A., Pagh, R., Williams, V. V., Zwick, U.: Listing triangles. In: Proceedings of the 41st International Colloquium on Automata, Languages and Programming, ICALP, pp 223–234 (2014)

  9. Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete algorithms, SODA, pp 623–632 (2002)

  10. Cormode, G., Jowhari, H.: A second look at counting triangles in graph streams (corrected). Theor. Comput. Sci. 683, 22–30 (2017)

    Article  MathSciNet  Google Scholar 

  11. Choi, S. -S., Kim, J. H.: Optimal query complexity bounds for finding graphs. Artif. Intell. 174(9–10), 551–569 (2010)

    Article  MathSciNet  Google Scholar 

  12. Dell, H., Lapinskas, J.: Fine-grained reductions from approximate counting to decision. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pp 281–288 (2018)

  13. Dell, H., Lapinskas, J., Meeks, K.: Approximately counting and sampling small witnesses using a colourful decision oracle. In: Chawla, S. (ed.) Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, SODA, pp 2201–2211 (2020)

  14. Dubhashi, D. P., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  15. Eden, T., Levi, A., Ron, D., Seshadhri, C.: Approximately counting triangles in sublinear time. SIAM J. Comput. 46(5), 1603–1646 (2017)

    Article  MathSciNet  Google Scholar 

  16. Eden, T., Ron, D., Seshadhri, C.: On approximating the number of k-cliques in sublinear time. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pp 722–734 (2018)

  17. Feige, U.: On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM J. Comput. 35(4), 964–984 (2006)

    Article  MathSciNet  Google Scholar 

  18. Goldreich, O., Ron, D.: Approximating average parameters of graphs. Random Struct. Algorithms 32(4), 473–493 (2008)

    Article  MathSciNet  Google Scholar 

  19. Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear-time. SIAM J. Discrete Math. 25(3), 1365–1411 (2011)

    Article  MathSciNet  Google Scholar 

  20. Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978)

    Article  MathSciNet  Google Scholar 

  21. Janson, S.: Large deviations for sums of partly dependent random variables. Random Struct. Algorithms 24(3), 234–248 (2004)

    Article  MathSciNet  Google Scholar 

  22. Jowhari, H., Ghodsi, M.: New streaming algorithms for counting triangles in graphs. In: Proceedings of the International Computing and Combinatorics Conference, pp 722–734 (2005)

  23. Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 589–597 (2013)

  24. Kane, D. M., Mehlhorn, K., Sauerwald, T., Sun, H.: Counting arbitrary subgraphs in data streams. In: Proceedings of the 39th International Colloquium on Automata, Languages and Programming, ICALP, pp 598–609 (2012)

  25. Kallaugher, J., Price, E.: A hybrid sampling scheme for triangle counting. In: Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp 1778–1797 (2017)

  26. Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.-L.: Counting and sampling triangles from a graph stream. Proc. VLDB Endow. 6(14), 1870–1881 (2013)

    Article  Google Scholar 

  27. Rubinstein, A., Schramm, T., Weinberg, S. M.: Computing exact minimum cuts without knowing the graph. In: Proceedings of the 9th Innovations in Theoretical Computer Science Conference, ITCS, pp 39:1–39:16 (2018)

  28. Ron, D., Tsur, G.: The power of an example: hidden set size approximation using group queries and conditional sampling. ACM Trans. Comput. Theory 8(4), 15 (2016)

    Article  MathSciNet  Google Scholar 

  29. Stockmeyer, L.: The complexity of approximate counting. In: Proceedings of the 15th Annual ACM Symposium on Theory of Computing, STOC, pp 118–126 (1983)

  30. Stockmeyer, L.: On approximation algorithms for #P. SIAM J. Comput. 14(4), 849–861 (1985)

    Article  MathSciNet  Google Scholar 

  31. Tangwongsan, K., Pavan, A., Tirthapura, S.: Parallel triangle counting in massive streaming graphs. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM, pp 781–786 (2013)

Download references

Acknowledgments

Anup Bhattacharya is supported by NPDF fellowship (No. PDF/2018/002072), Government of India. Arijit Ghosh is supported by Ramanujan Fellowship (No. SB/S2/RJN-064/2015), India. We thank the anonymous reviewers of ISAAC 2019 and TOCS to improve the results and presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arijit Ghosh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper has been accepted in ISAAC’19

Appendices

Appendix : A: Scenario Where ΔE is Bounded

In this Section, we discuss some scenarios where the number of triangles sharing an edge is bounded. An obvious example for such graphs are graphs with bounded degree. We explore some other scenarios.

  1. (i)

    Consider a graph G(P,E) such that the vertex set P corresponds to a subset of \(\mathbb {R}^{2}\) and (u,v) ∈ E if and only if the distance between u and v is exactly 1. The objective is to compute the number of triples of points from P forming an equilateral triangle having side length 1, that is, the number of triangles in G. Observe that there can be at most two triangles sharing an edge in G, that is, ΔE ≤ 2.

  2. (ii)

    Consider a graph G(P,E) such that the vertex set P corresponds to a set of points inside an N × N square in \(\mathbb {R}^{2}\) and (u,v) ∈ E if and only if the distance between u and v is at most 1. The objective is to compute the number of triples of points from P forming a triangle having each side length at most 1, that is, the number of triangles in G. For large enough N there can be bounded number of triangles sharing an edge in G with high probability.

  3. (iii)

    Consider a graph G(V,E) representing a community sharing information. Each node has some information and two nodes are connected if and only if there exists an edge between the nodes. Nodes increase their information by sharing information among their neighbors in G. Observe that the information of a node is derived by the set of neighbors. So, if two nodes have large number of common neighbors in G, then there is no need of an edge between the two nodes. So, the number of triangles on any edge in the graph is bounded. The objective is to compute the number of triangles in G, that is, the number of triples of nodes in G such that each pair of vertices are connected.

In (i) and (ii), TIS oracle can be implemented very efficiently. We can report a TIS query by just running a standard plane sweep algorithm in Computational Geometry that takes \(\mathcal {O}(n \log n)\) running time.

Appendix : B: Some Probability Results

Proposition 23

Let X be a random variable. Then \(\mathbb {E}[X] \leq \sqrt {\mathbb {E}[X^{2}]}\).

Lemma 24

([14, Theorem 7.1]). Let f be a function of n random variables X1,…,Xn such that

  1. (i)

    Each Xi takes values from a set Ai,

  2. (ii)

    \(\mathbb {E}[f]\) is bounded, i.e., \(0 \leq \mathbb {E}[f] \leq M\),

  3. (iii)

    \({\mathscr{B}}\) be any event satisfying the following for each i ∈ [n].

    $$ \left| \mathbb{E}[f ~|~X_{1},\dots,X_{i-1},X_{i}=a_{i},\mathcal{B}^{c}] - \mathbb{E}[f ~|~X_{1},\dots,X_{i-1},X_{i}=a^{\prime}_{i},\mathcal{B}^{c}] \right|\leq c_{i}.$$

Then for any δ ≥ 0,

$${\mathbb{P}}\left( \left| f - \mathbb{E}[f] \right| > \delta + M{\mathbb{P}}(\mathcal{B}) \right) \leq e^{-{\delta^{2}}/{\sum\limits_{i=1}^{n} {c_{i}^{2}}}} + \mathbb{P}(\mathcal{B}).$$

Lemma 25 (Hoeffding’s inequality 14)

Let X1,…,Xn be n independent random variables such that Xi ∈ [ai,bi]. Then for \(X=\sum \limits _{i=1}^{n} X_{i}\), the following is true for any δ > 0.

$${\mathbb{P}} \left( \left| X - \mathbb{E}[X] \right| \geq \delta \right) \leq 2 \cdot e^{-{2\delta^{2}}/{\sum\limits_{i=1}^{n} (b_{i} - a_{i})^{2}}}.$$

Lemma 26 (Chernoff-Hoeffding bound 14)

Let X1,…,Xn be independent random variables such that Xi ∈ [0, 1]. For \(X=\sum \limits _{i=1}^{n} X_{i}\) and \(\mu _{l} \leq \mathbb {E}[X] \leq \mu _{h}\), the followings hold for any δ > 0.

  1. (i)

    \({\mathbb {P}} \left (X > \mu _{h} + \delta \right ) \leq e^{-2\delta ^{2}/n}\).

  2. (ii)

    \({\mathbb {P}} \left (X < \mu _{l} - \delta \right ) \leq e^{-2\delta ^{2} / n}\).

Lemma 27

[14, Theorem 3.2] Let X1,…,Xn be random variables such that aiXibi and \(X=\sum \limits _{i=1}^{n} X_{i}\). Let \(\mathcal {D}\) be the dependent graph, where \(V(\mathcal {D})=\{X_{1},\ldots ,X_{n}\}\) and \( E(\mathcal {D})= \{(X_{i},X_{j}): X_i \text {and} X_j \text {are dependent}\}\). Then for any δ > 0,

$$ {\mathbb{P}}(\left| X-\mathbb{E}[X] \right| \geq \delta) \leq 2e^{-2\delta^{2} / \chi^{*}(\mathcal{D})\sum\limits_{i=1}^{n}(b_{i}-a_{i})^{2}},$$

where \(\chi ^{*}(\mathcal {D})\) denotes the fractional chromatic number of \(\mathcal {D}\).

The following lemma directly follows from Lemma 27.

Lemma 28

Let X1,…,Xn be indicator random variables such that there are at most d many Xj’s on which an Xi depends and \(X=\sum \limits _{i=1}^{n} X_{i}\). Then for any δ > 0,

$${\mathbb{P}}(\left| X-\mathbb{E}[X] \right| \geq \delta) \leq 2e^{-2\delta^{2} / (d+1)n}.$$

Lemma 29 (Importance sampling 7)

Let (D1,w1,e1),…, (Dr,wr,er) are the given structures and each Di has an associated weight c(Di) satisfying

  1. (i)

    wi,ei ≥ 1,∀i ∈ [r];

  2. (ii)

    \(\frac {e_{i}}{\rho } \leq c(D_{i}) \leq e_{i} \rho \) for some ρ > 0 and all i ∈ [r]; and

  3. (iii)

    \(\sum \limits _{i=1}^{r} {w_{i}\cdot c(D_{i})} \leq M\).

Note that the exact values c(Di)’s are not known to us. Then there exists an algorithm that finds \((D^{\prime }_{1},w^{\prime }_{1},e^{\prime }_{1}),\ldots , (D^{\prime }_{s},w^{\prime }_{s},e^{\prime }_{s})\) such that, with probability at least 1 − δ, all of the above three conditions hold and

$$ \left| \sum\limits_{i=1}^{t} {w^{\prime}_{i}\cdot c(D^{\prime}_{i})} - \sum\limits_{i=1}^{r} {w_{i}\cdot c(D_{i})} \right| \leq \lambda S, $$

where \(S=\sum \limits _{i=1}^{r} {w_{i}\cdot c(D_{i})}\) and λ,δ > 0. The time complexity of the algorithm is \(\mathcal {O}(r)\) and \(s=\mathcal {O}\left (\frac {\rho ^{4} \log M \left (\log \log M + \log \frac {1}{\delta }\right )}{\lambda ^{2}}\right )\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhattacharya, A., Bishnu, A., Ghosh, A. et al. On Triangle Estimation Using Tripartite Independent Set Queries. Theory Comput Syst 65, 1165–1192 (2021). https://doi.org/10.1007/s00224-021-10043-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-021-10043-y

Keywords

Navigation