Debugging Large-scale Datalog,ACM Transactions on Programming Languages and Systems

当前位置： X-MOL 学术 › ACM Trans. Program. Lang. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Debugging Large-scale Datalog
ACM Transactions on Programming Languages and Systems ( IF 1.5 ) Pub Date : 2020-05-04 , DOI: 10.1145/3379446
David Zhao ₁ , Pavle Subotić ₂ , Bernhard Scholz ₁

Affiliation

Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications that model analysis problems process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug. While the database community has proposed several data provenance techniques that address the Declarative Debugging Challenge for Databases, in the cases of analysis problems, these state-of-the-art techniques do not scale. In this article, we introduce a novel bottom-up Datalog evaluation strategy for debugging: Our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations and a new fixed-point semantics for semi-naïve evaluation. A debugging query mechanism allows arbitrary provenance queries, constructing partial proof trees of tuples with minimal height. We integrate our technique into Soufflé, a Datalog engine that synthesizes C++ code, and achieve high performance by using specialized parallel data structures. Experiments are conducted with D OOP /DaCapo, producing proof annotations for tens of millions of output tuples. We show that our method has a runtime overhead of 1.31× on average while being more flexible than existing state-of-the-art techniques.

中文翻译：

调试大型数据日志

诸如 Datalog 之类的逻辑编程语言已成为流行的领域特定语言 (DSL)，用于解决大规模的现实世界问题，特别是静态程序分析和网络分析。建模分析问题的逻辑规范处理数百万个数据元组并包含数百个高度递归的规则。因此，众所周知，它们很难调试。虽然数据库社区已经提出了几种数据来源技术来解决声明式调试挑战对于数据库，在分析问题的情况下，这些最先进的技术无法扩展。在本文中，我们介绍了一种新颖的自下而上的 Datalog 调试评估策略：我们的出处评估策略依赖于新的出处格，其中包括证明注释和用于半天真的评估的新定点语义。调试查询机制允许任意出处查询，构建具有最小高度的元组的部分证明树。我们将我们的技术集成到 Soufflé（一种合成 C++ 代码的 Datalog 引擎）中，并通过使用专门的并行数据结构实现高性能。实验用 D 进行面向对象/DaCapo，为数千万个输出元组生成证明注释。我们表明，我们的方法平均具有 1.31 倍的运行时开销，同时比现有的最先进技术更灵活。

更新日期：2020-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11