research-article

Public Access

Declarative Probabilistic Programming with Datalog

Authors:
Vince BáRány

LogicBlox, Inc.

LogicBlox, Inc.
View Profile

,
Balder Ten Cate

LogicBlox, Inc.

LogicBlox, Inc.
View Profile

,
Benny Kimelfeld

Technion -- Israel Institute of Technology, Haifa, Israel

Technion -- Israel Institute of Technology, Haifa, Israel
View Profile

,
Dan Olteanu

University of Oxford, UK

University of Oxford, UK
View Profile

,
Zografoula Vagena

LogicBlox, Inc., Atlanta, GA, USA

LogicBlox, Inc., Atlanta, GA, USA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 42 Issue 4Article No.: 22pp 1–35https://doi.org/10.1145/3132700

Published:27 October 2017Publication History

ACM Transactions on Database Systems

Abstract

Probabilistic programming languages are used for developing statistical models. They typically consist of two components: a specification of a stochastic process (the prior) and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence.

In this article, we establish a probabilistic-programming extension of Datalog that, on the one hand, allows for defining a rich family of statistical models, and on the other hand retains the fundamental properties of declarativity. Our proposed extension provides mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.

References

Serge Abiteboul, Daniel Deutch, and Victor Vianu. 2014. Deduction with contradictions in datalog. In International Conference on Database Theory. 143--154.Google Scholar
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google Scholar
Alfred V. Aho, Catriel Beeri, and Jeffrey D. Ullman. 1979. The theory of joins in relational databases. ACM Trans. Datab. Syst. 4, 3 (1979), 297--314. Google ScholarDigital Library
Lyublena Antova, Christoph Koch, and Dan Olteanu. 2007a. From complete to incomplete information and back. In SIGMOD. 713--724. Google ScholarDigital Library
Lyublena Antova, Christoph Koch, and Dan Olteanu. 2007b. Query language support for incomplete information in the MayBMS system. In Very Large Data Bases. 1422--1425.Google Scholar
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the logicblox system. In SIGMOD. 1371--1382. Google ScholarDigital Library
Robert B. Ash and Catherine Doleans-Dade. 2000. Probability 8 Measure Theory. Harcourt Academic Press.Google Scholar
Chitta Baral, Michael Gelfond, and Nelson Rushton. 2009. Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9, 1 (2009), 57--144. Google ScholarDigital Library
Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2016. Declarative probabilistic programming with datalog. In International Conference on Database Theory, Vol. 48. 7:1--7:19.Google Scholar
Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. 2010. Probabilistic XML via markov chains. In Proceedings of the VLDB Endowment 3, 1 (2010), 770--781. Google ScholarDigital Library
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Machine Learn. Res. 3 (2003), 993--1022.Google ScholarDigital Library
Matthias Bröcheler, Lilyana Mihalkova, and Lise Getoor. 2010. Probabilistic similarity logic. In Uncertainty in Artificial Intelligence. 73--82.Google Scholar
Zhuhua Cai, Zografoula Vagena, Luis Leopoldo Perez, Subramanian Arumugam, Peter J. Haas, and Christopher M. Jermaine. 2013. Simulation of database-valued Markov chains using SimSQL. In SIGMOD. 637--648. Google ScholarDigital Library
Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A. Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1 (2017), 1–32. Google ScholarCross Ref
Sara Cohen and Benny Kimelfeld. 2010. Querying parse trees of stochastic context-free grammars. In International Conference on Database Theory. ACM, 62--75. Google ScholarDigital Library
Vítor Santos Costa, David Page, Maleeha Qazi, and James Cussens. 2003. CLP(BN): Constraint logic programming for probabilistic knowledge. In Uncertainty in Artificial Intelligence. 517--524.Google Scholar
Mary Kathryn Cowles. 2013. Applied Bayesian Statistics: With R and OpenBUGS Examples. Vol. 98. Springer Science 8 Business Media.Google Scholar
Daniel Deutch, Christoph Koch, and Tova Milo. 2010. On probabilistic fixpoint and Markov chain query languages. In Symposium on Principles of Database Systems. 215--226. Google ScholarDigital Library
Pedro Domingos and Daniel Lowd. 2009. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan 8 Claypool Publishers.Google Scholar
Jason Eisner and Nathaniel Wesley Filardo. 2010. Dyna: Extending datalog for modern AI. In Datalog Reloaded - 1st International Workshop (Datalog’10). Revised Selected Papers (Lecture Notes in Computer Science), Vol. 6702. Springer, 181--220.Google Scholar
Thomas Eiter, Georg Gottlob, and Heikki Mannila. 1997. Disjunctive datalog. ACM Trans. Database Syst. 22, 3 (1997), 364--418. Google ScholarDigital Library
Kousha Etessami and Mihalis Yannakakis. 2009. Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56, 1 (2009), 1:1–1:66.Google ScholarDigital Library
Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 1 (2005), 89--124. DOI:http://dx.doi.org/10.1016/j.tcs.2004.10.033 Google ScholarDigital Library
Nissim Francez. 1986. Fairness. Springer. DOI:http://dx.doi.org/10.1007/978-1-4612-4886-6 Google ScholarCross Ref
Norbert Fuhr. 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. JASIS 51, 2 (2000), 95--110. Google ScholarDigital Library
Noah D. Goodman. 2013. The principles and practice of probabilistic programming. In Symposium on Principles of Programming Languages. 399--402. Google ScholarDigital Library
Georg Gottlob, Thomas Lukasiewicz, MariaVanina Martinez, and Gerardo Simari. 2013. Query answering under probabilistic uncertainty in Datalog+/− ontologies. Ann. Math. AI 69, 1 (2013), 37--72.Google ScholarDigital Library
Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Maintaining views incrementally. In SIGMOD. 157--166. Google ScholarDigital Library
Bernd Gutmann, Ingo Thon, Angelika Kimmig, Maurice Bruynooghe, and Luc De Raedt. 2011. The magic of logical inference in probabilistic programming. TPLP 11, 4--5 (2011), 663--680. DOI:http://dx.doi.org/10.1017/S1471068411000238 Google ScholarCross Ref
Terry Halpin and Spencer Rugaber. 2014. LogiQL: A Query Language for Smart Databases. CRC Press. Google ScholarCross Ref
Shawn Hershey, Jeffrey Bernstein, Bill Bradley, Andrew Schweitzer, Noah Stein, Theophane Weber, and Benjamin Vigoda. 2012. Accelerating inference: Towards a full language, compiler and hardware stack. CoRR abs/1212.2991 (2012).Google Scholar
Daniel Roy. 2014. Repository on probabilistic programming languages. (2014). http://www.probabilistic-programming.org.Google Scholar
Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. 2011. Datalog and emerging applications: An interactive tutorial. In SIGMOD. 1213--1216.Google Scholar
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, and Peter J. Haas. 2008. MCDB: A Monte Carlo approach to managing uncertain data. In SIGMOD. 687--700. Google ScholarDigital Library
Abhay Kumar Jha and Dan Suciu. 2012. Probabilistic databases with markoviews. Proceedings of the VLDB Endowment 5, 11 (2012), 1160--1171. Google ScholarDigital Library
Benny Kimelfeld and Pierre Senellart. 2013. Probabilistic XML: Models and complexity. In Advances in Probabilistic Databases for Uncertain Information Management. Studies in Fuzziness and Soft Computing, Vol. 304. Springer, 39--66.Google Scholar
Angelika Kimmig, Bart Demoen, Luc De Raedt, Vitor Santos Costa, and Ricardo Rocha. 2011. On the implementation of the probabilistic logic programming language ProbLog. Theory Pract. Log. Program. 11 (2011), 235--262. DOI:http://dx.doi.org/10.1017/S1471068410000566 Google ScholarDigital Library
Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Meeting of the Association for Computational Linguistics. 423--430. Google ScholarDigital Library
John W. Lloyd. 1987. Foundations of Logic Programming. 2nd ed. Springer. Google ScholarCross Ref
Jorge Lobo, Jack Minker, and Arcot Rajasekar. 1992. Foundations of Disjunctive Logic Programming. MIT Press.Google Scholar
Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2009. Declarative networking. Commun. ACM 52, 11 (2009), 87--95. Google ScholarDigital Library
David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing implications of data dependencies. ACM Trans. Datab. Syst. 4, 4 (1979), 455--469. Google ScholarDigital Library
Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: A higher-order probabilistic programming platform with programmable inference. CoRR abs/1404.0099 (2014).Google Scholar
Andrew McCallum. 1999. Multi-label text classification with a mixture model trained by EM. In Association for the Advancement of Artificial Intelligence Workshop on Text Learning.Google Scholar
Brian Milch, Bhaskara Marthi, Stuart J. Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic models with unknown objects. In International Joint Conference on Artificial Intelligence. Professional Book Center, 1352--1359.Google Scholar
Boris Motik, Yavor Nenov, Robert Edgar Felix Piro, and Ian Horrocks. 2015. Incremental update of datalog materialisation: The backward/forward algorithm. In Association for the Advancement of Artificial Intelligence. 1560--1568.Google Scholar
Liem Ngo and Peter Haddawy. 1997. Answering queries from context-sensitive probabilistic knowledge bases. Theor. Comput. Sci. 171, 1--2 (1997), 147--177. DOI:http://dx.doi.org/10.1016/S0304-3975(96)00128-4 Google ScholarDigital Library
Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM.Machine Learning 39, 2–3 (2000), 103–134.Google Scholar
Davide Nitti, Tinne De Laet, and Luc De Raedt. 2016. Probabilistic logic programming for hybrid relational domains. Mach. Learn. 103, 3 (2016), 407--449. DOI:http://dx.doi.org/10.1007/s10994-016-5558-8 Google ScholarDigital Library
Feng Niu, Christopher Ré, AnHai Doan, and Jude W. Shavlik. 2011. Tuffy: Scaling up statistical inference in markov logic networks using an RDBMS. In Proceedings of the VLDB Endowment 4, 6 (2011), 373--384.Google ScholarDigital Library
Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In International Workshop on Searching and Integrating New Web Data Sources (CEUR Workshop Proc.), Vol. 884. 25--28.Google Scholar
Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Association for the Advancement of Artificial Intelligence. 2476--2482.Google Scholar
Brooks Paige and Frank Wood. 2014. A compilation target for probabilistic programming languages. In International Conference on Machine Learning, Vol. 32. 1935--1943.Google Scholar
Anand Patil, David Huard, and Christopher J. Fonnesbeck. 2010. PyMC: Bayesian stochastic modelling in python. J. Stat. Softw. 35, 4 (2010), 1--81. Google ScholarCross Ref
Judea Pearl. 1989. Probabilistic Reasoning in Intelligent Systems - Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
Avi Pfeffer. 2009. Figaro: An Object-oriented Probabilistic Programming Language. Technical Report. Charles River Analytics. 137 pages.Google Scholar
David Poole. 2008. The independent choice logic and beyond. In Probabilistic Inductive Logic Programming - Theory and Applications. 222--243. Google ScholarCross Ref
Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan 8 Claypool Publishers. DOI:http://dx.doi.org/10.2200/S00692ED1V01Y201601AIM032 Google ScholarCross Ref
Luc De Raedt and Angelika Kimmig. 2015. Probabilistic (logic) programming concepts. Mach. Learn. 100, 1 (2015), 5--47. Google ScholarDigital Library
H. Raiffa and R. Schlaifer. 1961. Applied Statistical Decision Theory. Harvard University Press.Google Scholar
Irma Ravkic, Jan Ramon, and Jesse Davis. 2015. Learning relational dependency networks in hybrid domains. Mach. Learn. 100, 2--3 (2015), 217--254. DOI:http://dx.doi.org/10.1007/s10994-015-5483-2 Google ScholarDigital Library
Fabrizio Riguzzi, Elena Bellodi, and Evelina Lamma. 2012. Probabilistic ontologies in datalog+/-. In Italian Conference on Computational Logic. 221--235.Google Scholar
Taisuke Sato and Yoshitaka Kameya. 1997. PRISM: A language for symbolic-statistical modeling. In International Joint Conference on Artificial Intelligence. 1330--1339.Google Scholar
Oded Shmueli. 1993. Equivalence of datalog queries is undecidable. J. Logic Progr. 15, 3 (1993), 231--241. Google ScholarDigital Library
Parag Singla and Pedro M. Domingos. 2006. Entity resolution with Markov logic. In IEEE International Conference on Data Mining. 572--582. Google ScholarDigital Library
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan 8 Claypool Publishers.Google Scholar
Nachum Dershowitz. 2005. Term Rewriting Systems. Marc Bezem, Jan Willem Klop, and Roel De Vrijer (Eds.). Cambridge University Press, Cambridge Tracts in Theoretical Computer Science 55, Vol.5. 395–399.Google Scholar
Joost Vennekens, Marc Denecker, and Maurice Bruynooghe. 2009. CP-logic: A language of causal probabilistic events and its relation to logic programming. TPLP 9, 3 (2009), 245--308. DOI:http://dx.doi.org/10.1017/S1471068409003767 Google ScholarDigital Library
Jennifer Widom. 2008. Trio: A system for data, uncertainty, and lineage. In Managing and Mining Uncertain Data, Charu Aggarwal (Ed.). Springer-Verlag, Chapter 5.Google Scholar

Index Terms

Declarative Probabilistic Programming with Datalog
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
2. Theory of computation
  1. Logic
    1. Constraint and logic programming
  2. Theory and algorithms for application domains
    1. Database theory
      1. Database query languages (principles)

Recommendations

Generative Datalog with Continuous Distributions
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and ...
Read More
Abstract Hilbertian deductive systems, infon logic, and Datalog

In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
Read More
Disjunctive datalog with existential quantifiers: Semantics, decidability, and complexity issues

Datalog is one of the best-known rule-based languages, and extensions of it are used in a wide context of applications. An important Datalog extension is Disjunctive Datalog , which significantly increases the expressivity of the basic language. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Database Systems Volume 42, Issue 4
Invited Paper from SIGMOD 2016, Invited Paper from PODS 2016, Invited Paper from ICDT 2016 and Regular Papers
December 2017
241 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3155316
Editor:
Christian S. Jensen
Aalborg University, Denmark
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2017
- Accepted: 1 August 2017
- Revised: 1 May 2017
- Received: 1 August 2016
Published in tods Volume 42, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chase
Datalog
declarative
probabilistic programming
probability measure space
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 839
  Total Downloads
- Downloads (Last 12 months)154
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Declarative Probabilistic Programming with Datalog

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Generative Datalog with Continuous Distributions

Abstract Hilbertian deductive systems, infon logic, and Datalog

Disjunctive datalog with existential quantifiers: Semantics, decidability, and complexity issues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Declarative Probabilistic Programming with Datalog

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Generative Datalog with Continuous Distributions

Abstract Hilbertian deductive systems, infon logic, and Datalog

Disjunctive datalog with existential quantifiers: Semantics, decidability, and complexity issues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media