Learning From Query-Answers,ACM Transactions on Database Systems

当前位置： X-MOL 学术 › ACM Trans. Database Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning From Query-Answers
ACM Transactions on Database Systems ( IF 2.2 ) Pub Date : 2018-12-10 , DOI: 10.1145/3277503
Niccolò Meneghetti , Oliver Kennedy ₁ , Wolfgang Gatterbauer ₂

Affiliation

Tuple-independent and disjoint-independent probabilistic databases (TI- and DI-PDBs) represent uncertain data in a factorized form as a product of independent random variables that represent either tuples (TI-PDBs) or sets of tuples (DI-PDBs). When the user submits a query, the database derives the marginal probabilities of each output-tuple, exploiting the underlying assumptions of statistical independence. While query processing in TI- and DI-PDBs has been studied extensively, limited research has been dedicated to the problems of updating or deriving the parameters from observations of query results . Addressing this problem is the main focus of this article. We first introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way. The key idea of B-PDBs is to treat each parameter as a latent, Beta-distributed random variable. We show how this simple expedient enables both belief updating and parameter learning in a principled way, without imposing any burden on regular query processing. Building on B-PDBs, we then introduce Dirichlet Probabilistic Databases (D-PDBs), a generalization of DI-PDBs with similar properties. We provide the following key contributions for both B- and D-PDBs: (i) We study the complexity of performing Bayesian belief updates and devise efficient algorithms for certain tractable classes of queries; (ii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iii) we present an algorithm for efficiently computing conditional probabilities, allowing us to efficiently implement B- and D-PDBs via a standard relational engine; and (iv) we support our conclusions with extensive experimental results.

中文翻译：

从查询答案中学习

独立于元组和独立于不相交的概率数据库（TI 和 DI-PDB）以分解形式表示不确定数据，作为独立随机变量的乘积，这些变量代表元组 (TI-PDB) 或元组集 (DI-PDB)。当用户提交查询时，数据库利用统计独立性的基本假设推导出每个输出元组的边际概率。虽然 TI 和 DI-PDB 中的查询处理已被广泛研究，但对以下问题的研究有限从查询结果的观察中更新或派生参数. 解决这个问题是本文的重点。我们首先介绍Beta 概率数据库(B-PDBs)，TI-PDBs 的泛化，旨在支持 (i)信念更新(ii)参数学习以有原则和可扩展的方式。B-PDB 的关键思想是将每个参数视为潜在的、Beta 分布的随机变量。我们展示了这个简单的权宜之计如何以原则性的方式实现信念更新和参数学习，而不会给常规查询处理带来任何负担。在 B-PDB 的基础上，我们引入狄利克雷概率数据库（D-PDBs），具有相似属性的 DI-PDBs 的概括。我们为 B-PDB 和 D-PDB 提供了以下主要贡献： (i) 我们研究了执行贝叶斯信念更新的复杂性，并为某些易于处理的查询类别设计了有效的算法；(ii) 我们提出了一种用于计算参数的最大似然估计的 soft-EM 算法；(iii) 我们提出了一种有效计算条件概率的算法，使我们能够通过标准关系引擎有效地实现 B-PDB 和 D-PDB；(iv) 我们以广泛的实验结果支持我们的结论。

更新日期：2018-12-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11