当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Open-world probabilistic databases: Semantics, algorithms, complexity
Artificial Intelligence ( IF 14.4 ) Pub Date : 2021-02-15 , DOI: 10.1016/j.artint.2021.103474
İsmail İlkan Ceylan , Adnan Darwiche , Guy Van den Broeck

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic databases. Many systems based on probabilistic databases, however, still have certain semantic deficiencies, which limit their potential applications. We revisit the semantics of probabilistic databases, and argue that the closed-world assumption of probabilistic databases, i.e., the assumption that facts not appearing in the database have the probability zero, conflicts with the everyday use of large-scale probabilistic knowledge bases. To address this discrepancy, we propose open-world probabilistic databases, as a new probabilistic data model. In this new data model, the probabilities of unknown facts, also called open facts, can be assigned any probability value from a default probability interval. Our analysis entails that our model aligns better with many real-world tasks such as query answering, relational learning, knowledge base completion, and rule mining. We make various technical contributions. We show that the data complexity dichotomy, between polynomial time and

Image 1
, for evaluating unions of conjunctive queries on probabilistic databases can be lifted to our open-world model. This result is supported by an algorithm that computes the probabilities of the so-called safe queries efficiently. Based on this algorithm, we prove that evaluating safe queries is in linear time for probabilistic databases, under reasonable assumptions. This remains true in open-world probabilistic databases for a more restricted class of safe queries. We extend our data complexity analysis beyond unions of conjunctive queries, and obtain a host of complexity results for both classical and open-world probabilistic databases. We conclude our analysis with an in-depth investigation of the combined complexity in the respective models.



中文翻译:

开放世界概率数据库:语义,算法,复杂性

大规模的概率知识库在学术界和工业界变得越来越重要。它们由新数据不断扩展,并由现代信息提取工具提供支持,这些工具将概率与知识库事实相关联。存储和处理此类数据的最新技术基于概率数据库。但是,许多基于概率数据库的系统仍然存在某些语义缺陷,这限制了它们的潜在应用。我们回顾了概率数据库的语义,并提出了概率数据库的封闭世界假设,即未出现在数据库中的事实的概率为零与大型概率知识库的日常使用相冲突。为了解决这种差异,我们建议使用开放世界概率数据库作为新的概率数据模型。在这个新的数据模型中,未知事实的概率(也称为开放事实)可以从默认概率间隔中分配任何概率值。我们的分析需要我们的模型更好地与许多现实世界中的任务保持一致,例如查询答案关系学习知识库完成规则挖掘。我们做出各种技术贡献。我们表明,数据复杂二分法,多项式时间之间

图片1
,用于评估概率数据库上的联合查询的并集可以提升到我们的开放世界模型。该结果由一种算法来支持,该算法可以有效地计算所谓的安全查询的概率。基于此算法,我们证明了在合理的假设下,概率数据库的评估安全查询的时间线性的。对于限制更为严格的安全查询类,在开放世界概率数据库中仍然如此。我们将数据复杂性分析扩展到联合查询的并集之外,并且获得了经典和开放世界概率数据库的大量复杂性结果。我们通过深入研究各个模型中的组合复杂性来结束我们的分析。

更新日期:2021-02-15
down
wechat
bug