Not all FPRASs are equal: demystifying FPRASs for DNF-counting

Meel, Kuldeep S.; Shrotri, Aditya A.; Vardi, Moshe Y.

doi:10.1007/s10601-018-9301-x

Not all FPRASs are equal: demystifying FPRASs for DNF-counting

Published: 26 December 2018

Volume 24, pages 211–233, (2019)
Cite this article

Constraints Aims and scope Submit manuscript

Kuldeep S. Meel¹,
Aditya A. Shrotri² &
Moshe Y. Vardi²

353 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in artificial intelligence with applications in diverse domains ranging from network reliability to probabilistic databases. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques for #DNF. Consequently, several Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do the various approaches stack up against each other empirically? In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF. The result of our study produces a nuanced picture. First of all, we observe that there is no single best algorithm that outperforms all others for all classes of formulas and input parameters. Second, we observe that the algorithm with one of the worst time complexities solves the largest number of benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compressing branch-and-bound trees

Article 06 April 2024

Certified SAT solving with GPU accelerated inprocessing

Article Open access 02 August 2023

Solving SMT over Non-linear Real Arithmetic via Numerical Sampling and Symbolic Verification

Notes

Note that \({\mathcal {A}}\) is typically represented implicitly such as using constraints in DNF in the context of this paper.
Code and results can be accessed at https://gitlab.com/Shrotri/DNF_Counting
Figures are best viewed online in color.

References

Dueñas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y. (2017). Counting-based reliability estimation for power-transmission grids. In Proceedings of AAAI conference on artificial intelligence (AAAI).
Bacchus, F., Dalmao, S., Pitassi, T. (2003). Algorithms and complexity results for #SAT and Bayesian inference, In Proceedings of FOCS (pp. 340–351) ISBN: 0-7695-2040-5. http://dl.acm.org/citation.cfm?id=946243.946291.
Sang, T., Beame, P., Kautz, H. (2005). Performing Bayesian inference by weighted model counting. In Prof. of AAAI (pp. 475–481).
Dalvi, N., & Suciu, D. (2007). Efficient query evaluation on probabilistic databases. The VLDB Journal, 16(4), 523–544.
Article Google Scholar
Biondi, F., Enescu, M., Heuser, A., Legay, A., Meel, K.S., Quilbeuf, J. (2018). Scalable approximation of quantitative information flow in programs. In Proceedings of VMCAI.
Karger, D.R. (2001). A randomized fully polynomial time approximation scheme for the all-terminal network reliability problem. SIAM Review.
Valiant, L.G. (1979). The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3), 410–421.
Article MathSciNet Google Scholar
Karp, R.M., & Luby, M. (1983). Monte Carlo algorithms for enumeration and reliability problems. In Proceedings of FOCS.
Karp, R.M., Luby, M., Madras, N. (1989). Monte Carlo approximation algorithms for enumeration problems. Journal of Algorithms, 10(3), 429–448.
Article MathSciNet Google Scholar
Vazirani, V.V. (2013). Approximation algorithms. Springer Science & Business Media.
Dagum, P., Karp, R., Luby, M., Ross, S. (2000). An optimal algorithm for Monte Carlo estimation. SIAM Journal on Computing, 29(5), 1484–1496.
Article MathSciNet Google Scholar
Chakraborty, S., Meel, K.S., Vardi, M.Y. (2016). Algorithmic improvements in approximate counting for probabilistic inference: from linear to logarithmic SAT call. In Proceedings of IJCAI.
Meel, K.S., Shrotri, A.A., Vardi, M.Y. (2017). On hashing-based approaches to approximate DNF-counting. In Proceedings of FSTTCS.
Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B. (2013). Taming the curse of dimensionality: discrete integration by hashing and optimization. In Proceedings of ICML (pp. 334–342).
Meel, K.S. (2018). Constrained counting and sampling: bridging the gap between theory and practice. arXiv:1806.02239.
Carter, J.L., & Wegman, M.N. (1977). Universal classes of hash functions. In Proceedings of STOC (pp. 106–112). ACM.
Luby, M., & Veličković, B. (1996). On deterministic approximation of DNF. Algorithmica, 16(4), 415–433.
Article MathSciNet Google Scholar
Trevisan, L. (2004). A note on approximate counting for k-DNF. In Approximation, randomization, and combinatorial optimization. Algorithms and techniques (pp. 417–425). Springer.
Gopalan, P., Meka, R., Reingold, O. (2013). DNF sparsification and a faster deterministic counting algorithm. Computational Complexity.
Ajtai, M., & Wigderson, A. (1985). Deterministic simulation of probabilistic constant depth circuits. In Proceedings of FOCS (pp. 11–19). IEEE.
Nisan, N. (1991). Pseudorandom bits for constant depth circuits. Combinatorica, 11(1), 63–70.
Article MathSciNet Google Scholar
De, A., Etesami, O., Trevisan, L., Tulsiani, M. (2010). Improved pseudorandom generators for depth 2 circuits. In Approximation, randomization, and combinatorial optimization. Algorithms and techniques (pp. 504–517). Springer.
Olteanu, D., Huang, J., Koch, C. (2010). Approximate confidence computation in probabilistic databases. In ICDE (pp. 145–156). IEEE.
Fink, R., & Olteanu, D. (2011). On the optimal approximation of queries using tractable propositional languages. In Proceedings of ICDT. ACM.
Gatterbauer, W., & Suciu, D. (2014). Oblivious bounds on the probability of Boolean functions. ACM TODS, 39(1), 5.
Article MathSciNet Google Scholar
Tao, Q., Scott, S., Vinodchandran, N.V., Osugi, T.T. (2004). SVM-based generalized multiple-instance learning via approximate box counting. In Proceedings of the twenty-first international conference on machine learning (p. 101). ACM.
Babai, L. (1979). Monte-Carlo algorithms in graph isomorphism testing. Université tde Montréal Technical Report. DMS, pp 79–10.
Motwani, R., & Raghavan, P. (2010). Randomized algorithms.
Albrecht, M., & Bard, G. (2012). The M4RI Library – Version 20121224. http://m4ri.sagemath.org.
Huang, J., Antova, L., Koch, C., Olteanu, D. (2009). MayBMS: a probabilistic database management system. In Proceedings of SIGMOD. ACM.
TPC Benchmark H. http://www.tpc.org/.
Mitchell, D., Selman, B., Levesque, H. (1992). Hard and easy distributions of SAT problems. In Proceedings of AAAI (pp. 459–465).
Thurley, M. (2006). SharpSAT: counting models with advanced component caching and implicit BCP. In Proceedings of SAT (pp. 424–429).
Google Scholar
Chakraborty, S., Meel, K.S., Vardi, M.Y. (2013). A scalable approximate model counter. In Proceedings of CP (pp. 200–216).
Google Scholar

Download references

Acknowledgements

The authors would like to thank anonymous reviewers for their insightful comments and suggestions. Moshe Y. Vardi and Aditya A. Shrotri’s work was supported in parts by NSF grant IIS-1527668, NSF Expeditions in Computing project “ExCAPE: Expeditions in Computer Augmented Program Engineering”. Kuldeep S. Meel’s work was supported in parts by NUS ODPRT Grant R-252-000-685-133, AI Singapore Grant R-252-000-A16-490, and Sung Kah Kay Assistant Professorship Fund.

Author information

Authors and Affiliations

National University of Singapore, Singapore, Singapore
Kuldeep S. Meel
Rice University, Houston, TX, USA
Aditya A. Shrotri & Moshe Y. Vardi

Authors

Kuldeep S. Meel
View author publications
You can also search for this author in PubMed Google Scholar
Aditya A. Shrotri
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Y. Vardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya A. Shrotri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author names are ordered alphabetically by last name and does not indicate contribution

Appendix

For obtaining a concrete algorithm from the framework described in Algorithm 2, we need to instantiate the sub-procedures SampleHashFunction, GetLowerBound, GetUpperBound, EnumerateNextSol, ExtractSlice and ComputeIncrement for a particular counting problem. We now show how SymbolicDNFApproxMC [13], which uses Row Echelon XOR hash functions, and the concepts of Symbolic Hashing and Stochastic Cell-Counting, can be obtained through such instantiations. Then we prove that by substituting the BinarySearch procedure by ReverseSearch, the complexity of the resulting algorithm is improved by polylog factors.

1.1 SampleHashFunction

One can directly invoke the procedure SampleBase described in Algorithm 4 of [13] with minor modifications. This is shown in Algorithm 7. Note that the hash function A,b,y so obtained belongs to the Row Echelon XOR family.

1.2 Lower and upper bounds

As shown in [13], it suffices to search between \( {\mathsf {n}} - {\mathsf {w}} - \log {\mathsf {hiThresh}} \) and \( {\mathsf {n}} - {\mathsf {w}} + \log {\mathsf {m}} - \log {\mathsf {hiThresh}} \) hash constraints. Therefore the functions GetLowerBound and GetUpperBound return these values respectively.

1.3 Extracting a prefix slice

Procedure ExtractSlice required for ReverseSearch is shown in Algorithm 8. If flip is false, ExtractSlice returns the result of the procedure Extract (described in [13]) directly. Otherwise, the p-th bit of y^[y] is negated before being passed to Extract.

1.4 EnumerateNextSol

SymbolicDNFApproxMC enumerates solutions in the cell, in the order of a Gray code sequence, for better complexity. This is achieved by invoking the procedure enumREX (Algorithm 1 in [13]).

1.5 ComputeIncrement

Procedure CheckSAT (Algorithm 10 adapted from [13]) can be used to compute the increments to Y_cell as shown in Algorithm 9. The assignment s is divided into a solution x and a cube Fⁱ using the same Interpret function used in line 7 of Algorithm 6 in [13]. CheckSAT samples a cube at random in line 3 and checks if the assignment x satisfies it in line 5. The returned value follows the geometric distribution [9], and can be used to compute an accurate probabilistic estimate Y_cell of the true number of solutions in the cell [13].

Lemma 1

The complexity of BSAT is \(\mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}})\) .

Proof

Y_cell is incremented by c_x/m in line 5 of BSAT after a call to ComputeIncrement and CheckSAT. Since BSAT returns after Y_cell reaches threshold, the sum of c_x over all invocations of CheckSAT is m ⋅threshold. Every time c_x is incremented, the check in line 5 of CheckSAT is performed which takes \( \mathcal {O}(n) \) time. Moreover, EnumerateNextSol also takes \( \mathcal {O}(n) \) time as enumREX in [13] takes \( \mathcal {O}(n) \) time. As a result, the complexity of BSAT is \( \mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}}) \). \(\ \Box \)

Lemma 2

The complexity of ReverseSearch is \(\mathcal {O}({\textsf {m}} \cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\) .

Proof

In ReverseSearch, BSAT is invoked with different thresholds (say T₁,T₂,T₃…) in each iteration of the for loop in line 9 (Algorithm 6) depending on the value of Y_total. As a result of the check in line 13, it follows that T₁ + T₂ + T₃ + … = hiThresh. Therefore the complexity of all invocations of BSAT is \( \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot (T_{1} + T_{2} + T_{3} + \ldots )) = \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot {\mathsf {hiThresh}}) \). The complexity of ExtractSlice in line 12 is \( \mathcal {O}({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))^{2}) \) [13], and the loop in line 9 can be executed at most \(\mathcal {O}(\log \log {\mathsf {m}}) \) times. Therefore, the complexity of ReverseSearch is \(\mathcal {O}(\log \log {\mathsf {m}}\cdot ({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))^{2}) + {\textsf {m}}\cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\), which is \(\mathcal {O}({\mathsf {m}}\cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\). \(\ \Box \)

We are now ready to prove Theorem 1.

Proof

In Algorithm 2, ApproxMCCore is invoked \( \mathcal {O}(\log (1/\delta )) \) times, which in turn makes a call to ReverseSearch. The complexity of SampleHashFunction is \( \mathcal {O}({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))) \) [13]. Since \(\mathsf {hiThresh} = \mathcal {O}(1/\varepsilon ^{2}) \), the complexity of Algorithm 2 is \( \mathcal {O}(\mathsf {m}\cdot {\mathsf {n}}\cdot (1/\varepsilon ^{2})\cdot \log (1/\delta ) + \mathsf {n}(\log \mathsf {m} + \log (1/\varepsilon ^{2})) \), which is \( \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot (1/\varepsilon ^{2})\cdot \log (1/\delta ))\). \(\ \Box \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meel, K.S., Shrotri, A.A. & Vardi, M.Y. Not all FPRASs are equal: demystifying FPRASs for DNF-counting. Constraints 24, 211–233 (2019). https://doi.org/10.1007/s10601-018-9301-x

Download citation

Published: 26 December 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s10601-018-9301-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Not all FPRASs are equal: demystifying FPRASs for DNF-counting

Abstract

Access this article