Abstract
Probabilistic programming languages are used for developing statistical models. They typically consist of two components: a specification of a stochastic process (the prior) and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence.
In this article, we establish a probabilistic-programming extension of Datalog that, on the one hand, allows for defining a rich family of statistical models, and on the other hand retains the fundamental properties of declarativity. Our proposed extension provides mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.
- Serge Abiteboul, Daniel Deutch, and Victor Vianu. 2014. Deduction with contradictions in datalog. In International Conference on Database Theory. 143--154.Google Scholar
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.Google Scholar
- Alfred V. Aho, Catriel Beeri, and Jeffrey D. Ullman. 1979. The theory of joins in relational databases. ACM Trans. Datab. Syst. 4, 3 (1979), 297--314. Google ScholarDigital Library
- Lyublena Antova, Christoph Koch, and Dan Olteanu. 2007a. From complete to incomplete information and back. In SIGMOD. 713--724. Google ScholarDigital Library
- Lyublena Antova, Christoph Koch, and Dan Olteanu. 2007b. Query language support for incomplete information in the MayBMS system. In Very Large Data Bases. 1422--1425.Google Scholar
- Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the logicblox system. In SIGMOD. 1371--1382. Google ScholarDigital Library
- Robert B. Ash and Catherine Doleans-Dade. 2000. Probability 8 Measure Theory. Harcourt Academic Press.Google Scholar
- Chitta Baral, Michael Gelfond, and Nelson Rushton. 2009. Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9, 1 (2009), 57--144. Google ScholarDigital Library
- Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2016. Declarative probabilistic programming with datalog. In International Conference on Database Theory, Vol. 48. 7:1--7:19.Google Scholar
- Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. 2010. Probabilistic XML via markov chains. In Proceedings of the VLDB Endowment 3, 1 (2010), 770--781. Google ScholarDigital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Machine Learn. Res. 3 (2003), 993--1022.Google ScholarDigital Library
- Matthias Bröcheler, Lilyana Mihalkova, and Lise Getoor. 2010. Probabilistic similarity logic. In Uncertainty in Artificial Intelligence. 73--82.Google Scholar
- Zhuhua Cai, Zografoula Vagena, Luis Leopoldo Perez, Subramanian Arumugam, Peter J. Haas, and Christopher M. Jermaine. 2013. Simulation of database-valued Markov chains using SimSQL. In SIGMOD. 637--648. Google ScholarDigital Library
- Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A. Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1 (2017), 1–32. Google ScholarCross Ref
- Sara Cohen and Benny Kimelfeld. 2010. Querying parse trees of stochastic context-free grammars. In International Conference on Database Theory. ACM, 62--75. Google ScholarDigital Library
- Vítor Santos Costa, David Page, Maleeha Qazi, and James Cussens. 2003. CLP(BN): Constraint logic programming for probabilistic knowledge. In Uncertainty in Artificial Intelligence. 517--524.Google Scholar
- Mary Kathryn Cowles. 2013. Applied Bayesian Statistics: With R and OpenBUGS Examples. Vol. 98. Springer Science 8 Business Media.Google Scholar
- Daniel Deutch, Christoph Koch, and Tova Milo. 2010. On probabilistic fixpoint and Markov chain query languages. In Symposium on Principles of Database Systems. 215--226. Google ScholarDigital Library
- Pedro Domingos and Daniel Lowd. 2009. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan 8 Claypool Publishers.Google Scholar
- Jason Eisner and Nathaniel Wesley Filardo. 2010. Dyna: Extending datalog for modern AI. In Datalog Reloaded - 1st International Workshop (Datalog’10). Revised Selected Papers (Lecture Notes in Computer Science), Vol. 6702. Springer, 181--220.Google Scholar
- Thomas Eiter, Georg Gottlob, and Heikki Mannila. 1997. Disjunctive datalog. ACM Trans. Database Syst. 22, 3 (1997), 364--418. Google ScholarDigital Library
- Kousha Etessami and Mihalis Yannakakis. 2009. Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56, 1 (2009), 1:1–1:66.Google ScholarDigital Library
- Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 1 (2005), 89--124. DOI:http://dx.doi.org/10.1016/j.tcs.2004.10.033 Google ScholarDigital Library
- Nissim Francez. 1986. Fairness. Springer. DOI:http://dx.doi.org/10.1007/978-1-4612-4886-6 Google ScholarCross Ref
- Norbert Fuhr. 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. JASIS 51, 2 (2000), 95--110. Google ScholarDigital Library
- Noah D. Goodman. 2013. The principles and practice of probabilistic programming. In Symposium on Principles of Programming Languages. 399--402. Google ScholarDigital Library
- Georg Gottlob, Thomas Lukasiewicz, MariaVanina Martinez, and Gerardo Simari. 2013. Query answering under probabilistic uncertainty in Datalog+/− ontologies. Ann. Math. AI 69, 1 (2013), 37--72.Google ScholarDigital Library
- Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Maintaining views incrementally. In SIGMOD. 157--166. Google ScholarDigital Library
- Bernd Gutmann, Ingo Thon, Angelika Kimmig, Maurice Bruynooghe, and Luc De Raedt. 2011. The magic of logical inference in probabilistic programming. TPLP 11, 4--5 (2011), 663--680. DOI:http://dx.doi.org/10.1017/S1471068411000238 Google ScholarCross Ref
- Terry Halpin and Spencer Rugaber. 2014. LogiQL: A Query Language for Smart Databases. CRC Press. Google ScholarCross Ref
- Shawn Hershey, Jeffrey Bernstein, Bill Bradley, Andrew Schweitzer, Noah Stein, Theophane Weber, and Benjamin Vigoda. 2012. Accelerating inference: Towards a full language, compiler and hardware stack. CoRR abs/1212.2991 (2012).Google Scholar
- Daniel Roy. 2014. Repository on probabilistic programming languages. (2014). http://www.probabilistic-programming.org.Google Scholar
- Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. 2011. Datalog and emerging applications: An interactive tutorial. In SIGMOD. 1213--1216.Google Scholar
- Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, and Peter J. Haas. 2008. MCDB: A Monte Carlo approach to managing uncertain data. In SIGMOD. 687--700. Google ScholarDigital Library
- Abhay Kumar Jha and Dan Suciu. 2012. Probabilistic databases with markoviews. Proceedings of the VLDB Endowment 5, 11 (2012), 1160--1171. Google ScholarDigital Library
- Benny Kimelfeld and Pierre Senellart. 2013. Probabilistic XML: Models and complexity. In Advances in Probabilistic Databases for Uncertain Information Management. Studies in Fuzziness and Soft Computing, Vol. 304. Springer, 39--66.Google Scholar
- Angelika Kimmig, Bart Demoen, Luc De Raedt, Vitor Santos Costa, and Ricardo Rocha. 2011. On the implementation of the probabilistic logic programming language ProbLog. Theory Pract. Log. Program. 11 (2011), 235--262. DOI:http://dx.doi.org/10.1017/S1471068410000566 Google ScholarDigital Library
- Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Meeting of the Association for Computational Linguistics. 423--430. Google ScholarDigital Library
- John W. Lloyd. 1987. Foundations of Logic Programming. 2nd ed. Springer. Google ScholarCross Ref
- Jorge Lobo, Jack Minker, and Arcot Rajasekar. 1992. Foundations of Disjunctive Logic Programming. MIT Press.Google Scholar
- Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2009. Declarative networking. Commun. ACM 52, 11 (2009), 87--95. Google ScholarDigital Library
- David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing implications of data dependencies. ACM Trans. Datab. Syst. 4, 4 (1979), 455--469. Google ScholarDigital Library
- Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: A higher-order probabilistic programming platform with programmable inference. CoRR abs/1404.0099 (2014).Google Scholar
- Andrew McCallum. 1999. Multi-label text classification with a mixture model trained by EM. In Association for the Advancement of Artificial Intelligence Workshop on Text Learning.Google Scholar
- Brian Milch, Bhaskara Marthi, Stuart J. Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic models with unknown objects. In International Joint Conference on Artificial Intelligence. Professional Book Center, 1352--1359.Google Scholar
- Boris Motik, Yavor Nenov, Robert Edgar Felix Piro, and Ian Horrocks. 2015. Incremental update of datalog materialisation: The backward/forward algorithm. In Association for the Advancement of Artificial Intelligence. 1560--1568.Google Scholar
- Liem Ngo and Peter Haddawy. 1997. Answering queries from context-sensitive probabilistic knowledge bases. Theor. Comput. Sci. 171, 1--2 (1997), 147--177. DOI:http://dx.doi.org/10.1016/S0304-3975(96)00128-4 Google ScholarDigital Library
- Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM.Machine Learning 39, 2–3 (2000), 103–134.Google Scholar
- Davide Nitti, Tinne De Laet, and Luc De Raedt. 2016. Probabilistic logic programming for hybrid relational domains. Mach. Learn. 103, 3 (2016), 407--449. DOI:http://dx.doi.org/10.1007/s10994-016-5558-8 Google ScholarDigital Library
- Feng Niu, Christopher Ré, AnHai Doan, and Jude W. Shavlik. 2011. Tuffy: Scaling up statistical inference in markov logic networks using an RDBMS. In Proceedings of the VLDB Endowment 4, 6 (2011), 373--384.Google ScholarDigital Library
- Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In International Workshop on Searching and Integrating New Web Data Sources (CEUR Workshop Proc.), Vol. 884. 25--28.Google Scholar
- Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Association for the Advancement of Artificial Intelligence. 2476--2482.Google Scholar
- Brooks Paige and Frank Wood. 2014. A compilation target for probabilistic programming languages. In International Conference on Machine Learning, Vol. 32. 1935--1943.Google Scholar
- Anand Patil, David Huard, and Christopher J. Fonnesbeck. 2010. PyMC: Bayesian stochastic modelling in python. J. Stat. Softw. 35, 4 (2010), 1--81. Google ScholarCross Ref
- Judea Pearl. 1989. Probabilistic Reasoning in Intelligent Systems - Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
- Avi Pfeffer. 2009. Figaro: An Object-oriented Probabilistic Programming Language. Technical Report. Charles River Analytics. 137 pages.Google Scholar
- David Poole. 2008. The independent choice logic and beyond. In Probabilistic Inductive Logic Programming - Theory and Applications. 222--243. Google ScholarCross Ref
- Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan 8 Claypool Publishers. DOI:http://dx.doi.org/10.2200/S00692ED1V01Y201601AIM032 Google ScholarCross Ref
- Luc De Raedt and Angelika Kimmig. 2015. Probabilistic (logic) programming concepts. Mach. Learn. 100, 1 (2015), 5--47. Google ScholarDigital Library
- H. Raiffa and R. Schlaifer. 1961. Applied Statistical Decision Theory. Harvard University Press.Google Scholar
- Irma Ravkic, Jan Ramon, and Jesse Davis. 2015. Learning relational dependency networks in hybrid domains. Mach. Learn. 100, 2--3 (2015), 217--254. DOI:http://dx.doi.org/10.1007/s10994-015-5483-2 Google ScholarDigital Library
- Fabrizio Riguzzi, Elena Bellodi, and Evelina Lamma. 2012. Probabilistic ontologies in datalog+/-. In Italian Conference on Computational Logic. 221--235.Google Scholar
- Taisuke Sato and Yoshitaka Kameya. 1997. PRISM: A language for symbolic-statistical modeling. In International Joint Conference on Artificial Intelligence. 1330--1339.Google Scholar
- Oded Shmueli. 1993. Equivalence of datalog queries is undecidable. J. Logic Progr. 15, 3 (1993), 231--241. Google ScholarDigital Library
- Parag Singla and Pedro M. Domingos. 2006. Entity resolution with Markov logic. In IEEE International Conference on Data Mining. 572--582. Google ScholarDigital Library
- Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan 8 Claypool Publishers.Google Scholar
- Nachum Dershowitz. 2005. Term Rewriting Systems. Marc Bezem, Jan Willem Klop, and Roel De Vrijer (Eds.). Cambridge University Press, Cambridge Tracts in Theoretical Computer Science 55, Vol.5. 395–399.Google Scholar
- Joost Vennekens, Marc Denecker, and Maurice Bruynooghe. 2009. CP-logic: A language of causal probabilistic events and its relation to logic programming. TPLP 9, 3 (2009), 245--308. DOI:http://dx.doi.org/10.1017/S1471068409003767 Google ScholarDigital Library
- Jennifer Widom. 2008. Trio: A system for data, uncertainty, and lineage. In Managing and Mining Uncertain Data, Charu Aggarwal (Ed.). Springer-Verlag, Chapter 5.Google Scholar
Index Terms
- Declarative Probabilistic Programming with Datalog
Recommendations
Generative Datalog with Continuous Distributions
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and ...
Abstract Hilbertian deductive systems, infon logic, and Datalog
In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
Disjunctive datalog with existential quantifiers: Semantics, decidability, and complexity issues
Datalog is one of the best-known rule-based languages, and extensions of it are used in a wide context of applications. An important Datalog extension is Disjunctive Datalog , which significantly increases the expressivity of the basic language. ...
Comments