skip to main content
research-article
Public Access

Declarative Probabilistic Programming with Datalog

Published:27 October 2017Publication History
Skip Abstract Section

Abstract

Probabilistic programming languages are used for developing statistical models. They typically consist of two components: a specification of a stochastic process (the prior) and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence.

In this article, we establish a probabilistic-programming extension of Datalog that, on the one hand, allows for defining a rich family of statistical models, and on the other hand retains the fundamental properties of declarativity. Our proposed extension provides mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.

References

  1. Serge Abiteboul, Daniel Deutch, and Victor Vianu. 2014. Deduction with contradictions in datalog. In International Conference on Database Theory. 143--154.Google ScholarGoogle Scholar
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google ScholarGoogle Scholar
  3. Alfred V. Aho, Catriel Beeri, and Jeffrey D. Ullman. 1979. The theory of joins in relational databases. ACM Trans. Datab. Syst. 4, 3 (1979), 297--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lyublena Antova, Christoph Koch, and Dan Olteanu. 2007a. From complete to incomplete information and back. In SIGMOD. 713--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lyublena Antova, Christoph Koch, and Dan Olteanu. 2007b. Query language support for incomplete information in the MayBMS system. In Very Large Data Bases. 1422--1425.Google ScholarGoogle Scholar
  6. Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the logicblox system. In SIGMOD. 1371--1382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert B. Ash and Catherine Doleans-Dade. 2000. Probability 8 Measure Theory. Harcourt Academic Press.Google ScholarGoogle Scholar
  8. Chitta Baral, Michael Gelfond, and Nelson Rushton. 2009. Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9, 1 (2009), 57--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2016. Declarative probabilistic programming with datalog. In International Conference on Database Theory, Vol. 48. 7:1--7:19.Google ScholarGoogle Scholar
  10. Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. 2010. Probabilistic XML via markov chains. In Proceedings of the VLDB Endowment 3, 1 (2010), 770--781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Machine Learn. Res. 3 (2003), 993--1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Matthias Bröcheler, Lilyana Mihalkova, and Lise Getoor. 2010. Probabilistic similarity logic. In Uncertainty in Artificial Intelligence. 73--82.Google ScholarGoogle Scholar
  13. Zhuhua Cai, Zografoula Vagena, Luis Leopoldo Perez, Subramanian Arumugam, Peter J. Haas, and Christopher M. Jermaine. 2013. Simulation of database-valued Markov chains using SimSQL. In SIGMOD. 637--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A. Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1 (2017), 1–32. Google ScholarGoogle ScholarCross RefCross Ref
  15. Sara Cohen and Benny Kimelfeld. 2010. Querying parse trees of stochastic context-free grammars. In International Conference on Database Theory. ACM, 62--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vítor Santos Costa, David Page, Maleeha Qazi, and James Cussens. 2003. CLP(BN): Constraint logic programming for probabilistic knowledge. In Uncertainty in Artificial Intelligence. 517--524.Google ScholarGoogle Scholar
  17. Mary Kathryn Cowles. 2013. Applied Bayesian Statistics: With R and OpenBUGS Examples. Vol. 98. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  18. Daniel Deutch, Christoph Koch, and Tova Milo. 2010. On probabilistic fixpoint and Markov chain query languages. In Symposium on Principles of Database Systems. 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pedro Domingos and Daniel Lowd. 2009. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan 8 Claypool Publishers.Google ScholarGoogle Scholar
  20. Jason Eisner and Nathaniel Wesley Filardo. 2010. Dyna: Extending datalog for modern AI. In Datalog Reloaded - 1st International Workshop (Datalog’10). Revised Selected Papers (Lecture Notes in Computer Science), Vol. 6702. Springer, 181--220.Google ScholarGoogle Scholar
  21. Thomas Eiter, Georg Gottlob, and Heikki Mannila. 1997. Disjunctive datalog. ACM Trans. Database Syst. 22, 3 (1997), 364--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kousha Etessami and Mihalis Yannakakis. 2009. Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56, 1 (2009), 1:1–1:66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 1 (2005), 89--124. DOI:http://dx.doi.org/10.1016/j.tcs.2004.10.033 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nissim Francez. 1986. Fairness. Springer. DOI:http://dx.doi.org/10.1007/978-1-4612-4886-6 Google ScholarGoogle ScholarCross RefCross Ref
  25. Norbert Fuhr. 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. JASIS 51, 2 (2000), 95--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Noah D. Goodman. 2013. The principles and practice of probabilistic programming. In Symposium on Principles of Programming Languages. 399--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Georg Gottlob, Thomas Lukasiewicz, MariaVanina Martinez, and Gerardo Simari. 2013. Query answering under probabilistic uncertainty in Datalog+/− ontologies. Ann. Math. AI 69, 1 (2013), 37--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Maintaining views incrementally. In SIGMOD. 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bernd Gutmann, Ingo Thon, Angelika Kimmig, Maurice Bruynooghe, and Luc De Raedt. 2011. The magic of logical inference in probabilistic programming. TPLP 11, 4--5 (2011), 663--680. DOI:http://dx.doi.org/10.1017/S1471068411000238 Google ScholarGoogle ScholarCross RefCross Ref
  30. Terry Halpin and Spencer Rugaber. 2014. LogiQL: A Query Language for Smart Databases. CRC Press. Google ScholarGoogle ScholarCross RefCross Ref
  31. Shawn Hershey, Jeffrey Bernstein, Bill Bradley, Andrew Schweitzer, Noah Stein, Theophane Weber, and Benjamin Vigoda. 2012. Accelerating inference: Towards a full language, compiler and hardware stack. CoRR abs/1212.2991 (2012).Google ScholarGoogle Scholar
  32. Daniel Roy. 2014. Repository on probabilistic programming languages. (2014). http://www.probabilistic-programming.org.Google ScholarGoogle Scholar
  33. Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. 2011. Datalog and emerging applications: An interactive tutorial. In SIGMOD. 1213--1216.Google ScholarGoogle Scholar
  34. Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, and Peter J. Haas. 2008. MCDB: A Monte Carlo approach to managing uncertain data. In SIGMOD. 687--700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abhay Kumar Jha and Dan Suciu. 2012. Probabilistic databases with markoviews. Proceedings of the VLDB Endowment 5, 11 (2012), 1160--1171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Benny Kimelfeld and Pierre Senellart. 2013. Probabilistic XML: Models and complexity. In Advances in Probabilistic Databases for Uncertain Information Management. Studies in Fuzziness and Soft Computing, Vol. 304. Springer, 39--66.Google ScholarGoogle Scholar
  37. Angelika Kimmig, Bart Demoen, Luc De Raedt, Vitor Santos Costa, and Ricardo Rocha. 2011. On the implementation of the probabilistic logic programming language ProbLog. Theory Pract. Log. Program. 11 (2011), 235--262. DOI:http://dx.doi.org/10.1017/S1471068410000566 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Meeting of the Association for Computational Linguistics. 423--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John W. Lloyd. 1987. Foundations of Logic Programming. 2nd ed. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  40. Jorge Lobo, Jack Minker, and Arcot Rajasekar. 1992. Foundations of Disjunctive Logic Programming. MIT Press.Google ScholarGoogle Scholar
  41. Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2009. Declarative networking. Commun. ACM 52, 11 (2009), 87--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing implications of data dependencies. ACM Trans. Datab. Syst. 4, 4 (1979), 455--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: A higher-order probabilistic programming platform with programmable inference. CoRR abs/1404.0099 (2014).Google ScholarGoogle Scholar
  44. Andrew McCallum. 1999. Multi-label text classification with a mixture model trained by EM. In Association for the Advancement of Artificial Intelligence Workshop on Text Learning.Google ScholarGoogle Scholar
  45. Brian Milch, Bhaskara Marthi, Stuart J. Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic models with unknown objects. In International Joint Conference on Artificial Intelligence. Professional Book Center, 1352--1359.Google ScholarGoogle Scholar
  46. Boris Motik, Yavor Nenov, Robert Edgar Felix Piro, and Ian Horrocks. 2015. Incremental update of datalog materialisation: The backward/forward algorithm. In Association for the Advancement of Artificial Intelligence. 1560--1568.Google ScholarGoogle Scholar
  47. Liem Ngo and Peter Haddawy. 1997. Answering queries from context-sensitive probabilistic knowledge bases. Theor. Comput. Sci. 171, 1--2 (1997), 147--177. DOI:http://dx.doi.org/10.1016/S0304-3975(96)00128-4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM.Machine Learning 39, 2–3 (2000), 103–134.Google ScholarGoogle Scholar
  49. Davide Nitti, Tinne De Laet, and Luc De Raedt. 2016. Probabilistic logic programming for hybrid relational domains. Mach. Learn. 103, 3 (2016), 407--449. DOI:http://dx.doi.org/10.1007/s10994-016-5558-8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Feng Niu, Christopher Ré, AnHai Doan, and Jude W. Shavlik. 2011. Tuffy: Scaling up statistical inference in markov logic networks using an RDBMS. In Proceedings of the VLDB Endowment 4, 6 (2011), 373--384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In International Workshop on Searching and Integrating New Web Data Sources (CEUR Workshop Proc.), Vol. 884. 25--28.Google ScholarGoogle Scholar
  52. Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Association for the Advancement of Artificial Intelligence. 2476--2482.Google ScholarGoogle Scholar
  53. Brooks Paige and Frank Wood. 2014. A compilation target for probabilistic programming languages. In International Conference on Machine Learning, Vol. 32. 1935--1943.Google ScholarGoogle Scholar
  54. Anand Patil, David Huard, and Christopher J. Fonnesbeck. 2010. PyMC: Bayesian stochastic modelling in python. J. Stat. Softw. 35, 4 (2010), 1--81. Google ScholarGoogle ScholarCross RefCross Ref
  55. Judea Pearl. 1989. Probabilistic Reasoning in Intelligent Systems - Networks of Plausible Inference. Morgan Kaufmann.Google ScholarGoogle Scholar
  56. Avi Pfeffer. 2009. Figaro: An Object-oriented Probabilistic Programming Language. Technical Report. Charles River Analytics. 137 pages.Google ScholarGoogle Scholar
  57. David Poole. 2008. The independent choice logic and beyond. In Probabilistic Inductive Logic Programming - Theory and Applications. 222--243. Google ScholarGoogle ScholarCross RefCross Ref
  58. Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan 8 Claypool Publishers. DOI:http://dx.doi.org/10.2200/S00692ED1V01Y201601AIM032 Google ScholarGoogle ScholarCross RefCross Ref
  59. Luc De Raedt and Angelika Kimmig. 2015. Probabilistic (logic) programming concepts. Mach. Learn. 100, 1 (2015), 5--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. H. Raiffa and R. Schlaifer. 1961. Applied Statistical Decision Theory. Harvard University Press.Google ScholarGoogle Scholar
  61. Irma Ravkic, Jan Ramon, and Jesse Davis. 2015. Learning relational dependency networks in hybrid domains. Mach. Learn. 100, 2--3 (2015), 217--254. DOI:http://dx.doi.org/10.1007/s10994-015-5483-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Fabrizio Riguzzi, Elena Bellodi, and Evelina Lamma. 2012. Probabilistic ontologies in datalog+/-. In Italian Conference on Computational Logic. 221--235.Google ScholarGoogle Scholar
  63. Taisuke Sato and Yoshitaka Kameya. 1997. PRISM: A language for symbolic-statistical modeling. In International Joint Conference on Artificial Intelligence. 1330--1339.Google ScholarGoogle Scholar
  64. Oded Shmueli. 1993. Equivalence of datalog queries is undecidable. J. Logic Progr. 15, 3 (1993), 231--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Parag Singla and Pedro M. Domingos. 2006. Entity resolution with Markov logic. In IEEE International Conference on Data Mining. 572--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan 8 Claypool Publishers.Google ScholarGoogle Scholar
  67. Nachum Dershowitz. 2005. Term Rewriting Systems. Marc Bezem, Jan Willem Klop, and Roel De Vrijer (Eds.). Cambridge University Press, Cambridge Tracts in Theoretical Computer Science 55, Vol.5. 395–399.Google ScholarGoogle Scholar
  68. Joost Vennekens, Marc Denecker, and Maurice Bruynooghe. 2009. CP-logic: A language of causal probabilistic events and its relation to logic programming. TPLP 9, 3 (2009), 245--308. DOI:http://dx.doi.org/10.1017/S1471068409003767 Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jennifer Widom. 2008. Trio: A system for data, uncertainty, and lineage. In Managing and Mining Uncertain Data, Charu Aggarwal (Ed.). Springer-Verlag, Chapter 5.Google ScholarGoogle Scholar

Index Terms

  1. Declarative Probabilistic Programming with Datalog

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 42, Issue 4
          Invited Paper from SIGMOD 2016, Invited Paper from PODS 2016, Invited Paper from ICDT 2016 and Regular Papers
          December 2017
          241 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3155316
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 October 2017
          • Accepted: 1 August 2017
          • Revised: 1 May 2017
          • Received: 1 August 2016
          Published in tods Volume 42, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader