-
Flexible Skylines: Dominance for Arbitrary Sets of Monotone Functions ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-12-10 Paolo Ciaccia; Davide Martinenghi
Skyline and ranking queries are two popular, alternative ways of discovering interesting data in large datasets. Skyline queries are simple to specify, as they just return the set of all non-dominated tuples, thereby providing an overall view of potentially interesting results. However, they are not equipped with any means to accommodate user preferences or to control the cardinality of the result
-
Incremental and Approximate Computations for Accelerating Deep CNN Inference ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-12-06 Supun Nakandala; Kabir Nagrecha; Arun Kumar; Yannis Papakonstantinou
Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities
-
Functional Aggregate Queries with Additive Inequalities ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-12-06 Mahmoud Abo Khamis; Ryan R. Curtin; Benjamin Moseley; Hung Q. Ngo; Xuanlong Nguyen; Dan Olteanu; Maximilian Schleich
Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables. We refer to these queries as FAQ-AI for short. To answer FAQ-AI in the Boolean semiring, we define relaxed tree decompositions
-
MobilityDB: A Mobility Database Based on PostgreSQL and PostGIS ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-12-06 Esteban Zimányi; Mahmoud Sakr; Arthur Lesuisse
Despite two decades of research in moving object databases and a few research prototypes that have been proposed, there is not yet a mainstream system targeted for industrial use. In this article, we present MobilityDB, a moving object database that extends the type system of PostgreSQL and PostGIS with abstract data types for representing moving object data. The types are fully integrated into the
-
Editorial ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-09-11 Chris Jermaine
No abstract available.
-
Discovering Graph Functional Dependencies ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-09-11 Wenfei Fan; Chunming Hu; Xueli Liu; Ping Lu
This article studies discovery of Graph Functional Dependencies (GFDs), a class of functional dependencies defined on graphs. We investigate the fixed-parameter tractability of three fundamental problems related to GFD discovery. We show that the implication and satisfiability problems are fixed-parameter tractable, but the validation problem is co-W[1]-hard in general. We introduce notions of reduced
-
Maintaining Triangle Queries under Updates ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-08-26 Ahmet Kara; Hung Q. Ngo; Milos Nikolic; Dan Olteanu; Haozhe Zhang
We consider the problem of incrementally maintaining the triangle queries with arbitrary free variables under single-tuple updates to the input relations. We introduce an approach called IVMϵ that exhibits a trade-off between the update time, the space, and the delay for the enumeration of the query result, such that the update time ranges from the square root to linear in the database size while the
-
Synthesis of Incremental Linear Algebra Programs ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-08-26 Amir Shaikhha; Mohammed Elseidy; Stephan Mihaila; Daniel Espino; Christoph Koch
This article targets the Incremental View Maintenance (IVM) of sophisticated analytics (such as statistical models, machine learning programs, and graph algorithms) expressed as linear algebra programs. We present LAGO, a unified framework for linear algebra that automatically synthesizes efficient incremental trigger programs, thereby freeing the user from error-prone manual derivations, performance
-
Efficient Discovery of Matching Dependencies ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-08-26 Philipp Schirmer; Thorsten Papenbrock; Ioannis Koumarelas; Felix Naumann
Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets. We focus
-
Packing R-trees with Space-filling Curves: Theoretical Optimality, Empirical Efficiency, and Bulk-loading Parallelizability ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-08-26 Jianzhong Qi; Yufei Tao; Yanchuan Chang; Rui Zhang
The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index management, and over both practical and worst-case workloads. To address this need, we revisit two classic multidimensional access methods—the R-tree and the space-filling curve. We propose a novel R-tree packing strategy based on space-filling
-
Succinct Range Filters ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-06-21 Huanchen Zhang; Hyeontaek Lim; Viktor Leis; David G. Andersen; Michael Kaminsky; Kimberly Keeton; Andrew Pavlo
We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art
-
Adaptive Asynchronous Parallelization of Graph Algorithms ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-07-05 Wenfei Fan; Ping Lu; Wenyuan Yu; Jingbo Xu; Qiang Yin; Xiaojian Luo; Jingren Zhou; Ruochun Jin
This article proposes an Adaptive Asynchronous Parallel (AAP) model for graph computations. As opposed to Bulk Synchronous Parallel (BSP) and Asynchronous Parallel (AP) models, AAP reduces both stragglers and stale computations by dynamically adjusting relative progress of workers. We show that BSP, AP, and Stale Synchronous Parallel model (SSP) are special cases of AAP. Better yet, AAP optimizes parallel
-
Learning Models over Relational Data Using Sparse Tensors and Functional Dependencies ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-06-27 Mahmoud Abo Khamis; Hung Q. Ngo; Xuanlong Nguyen; Dan Olteanu; Maximilian Schleich
Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into
-
On the Language of Nested Tuple Generating Dependencies ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-07-13 Phokion G. Kolaitis; Reinhard Pichler; Emanuel Sallinger; Vadim Savenkov
During the past 15 years, schema mappings have been extensively used in formalizing and studying such critical data interoperability tasks as data exchange and data integration. Much of the work has focused on GLAV mappings, i.e., schema mappings specified by source-to-target tuple-generating dependencies (s-t tgds), and on schema mappings specified by second-order tgds (SO tgds), which constitute
-
Catching Numeric Inconsistencies in Graphs ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-06-27 Wenfei Fan; Xueli Liu; Ping Lu; Chao Tian
Numeric inconsistencies are common in real-life knowledge bases and social networks. To catch such errors, we extend graph functional dependencies with linear arithmetic expressions and built-in comparison predicates, referred to as numeric graph dependencies (NGDs). We study fundamental problems for NGDs. We show that their satisfiability, implication, and validation problems are Σp2-complete, Πp2-complete
-
Editorial ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-02-17 Christian S. Jensen
No abstract available.
-
Computing Optimal Repairs for Functional Dependencies ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-02-17 Ester Livshits; Benny Kimelfeld; Sudeepa Roy
We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair), which is obtained by a minimum number of value (cell)
-
A Game-theoretic Approach to Data Interaction ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-02-08 Ben McCamish; Vahid Ghadakchi; Arash Termehchy; Behrouz Touri; Eduardo Cotilla-Sanchez; Liang Huang; Soravit Changpinyo
As most users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management system (DBMS) may interact with users and use their feedback on the returned results to learn the information needs behind their queries. Current query interfaces assume that users do not learn and modify the way they express their
-
KTELO ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-02-08 Dan Zhang; Ryan McKenna; Ios Kotsogiannis; George Bissias; Michael Hay; Ashwin Machanavajjhala; Gerome Miklau
The adoption of differential privacy is growing, but the complexity of designing private, efficient, and accurate algorithms is still high. We propose a novel programming framework and system, ϵKTELO for implementing both existing and new privacy algorithms. For the task of answering linear counting queries, we show that nearly all existing algorithms can be composed from operators, each conforming
-
Efficient Enumeration Algorithms for Regular Document Spanners ACM Trans. Database Syst. (IF 2.927) Pub Date : 2020-02-08 Fernando Florenzano; Cristian Riveros; Martín Ugarte; Stijn Vansummeren; Domagoj Vrgoč
Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages to locate the data that a user wants to extract from a text document and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have efficient evaluation
-
Dichotomies for Evaluating Simple Regular Path Queries ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-12-17 Wim Martens; Tina Trautner
Regular path queries (RPQs) are a central component of graph databases. We investigate decision and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, paths without node repetitions (simple paths), and paths without edge repetitions (trails). Whereas arbitrary and shortest paths can be dealt with efficiently
-
General Temporally Biased Sampling Schemes for Online Model Management ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-12-17 Brian Hentschel; Peter J. Haas; Yuanyuan Tian
To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporally biased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying over time according to a specified “decay function.” We then periodically retrain the models on the current sample. This approach speeds up the training process
-
On the Expressive Power of Query Languages for Matrices ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-12-17 Robert Brijder; Floris Geerts; Jan Van Den Bussche; Timmy Weerwag
We investigate the expressive power of MATLANG, a formal language for matrix manipulation based on common matrix operations and linear algebra. The language can be extended with the operation inv for inverting a matrix. In MATLANG + inv, we can compute the transitive closure of directed graphs, whereas we show that this is not possible without inversion. Indeed, we show that the basic language can
-
ChronicleDB ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-12-17 Marc Seidemann; Nikolaus Glombiewski; Michael Körber; Bernhard Seeger
Reactive security monitoring, self-driving cars, the Internet of Things (IoT), and many other novel applications require systems for both writing events arriving at very high and fluctuating rates to persistent storage as well as supporting analytical ad hoc queries. As standard database systems are not capable of delivering the required write performance, log-based systems, key-value stores, and other
-
Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-12-17 Feilong Liu; Lingyan Yin; Spyros Blanas
The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This article considers how to leverage RDMA to improve the analytical
-
Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-12-17 Sibo Wang; Renchi Yang; Runhui Wang; Xiaokui Xiao; Zhewei Wei; Wenqing Lin; Yin Yang; Nan Tang
Given a graph G, a source node s, and a target node t, the personalized PageRank (PPR) of t with respect to s is the probability that a random walk starting from s terminates at t. An important variant of the PPR query is single-source PPR (SSPPR), which enumerates all nodes in G and returns the top-k nodes with the highest PPR values with respect to a given source s. PPR in general and SSPPR in particular
-
From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-06-19 Patrick Damme; Annett Ungethüm; Juliana Hildebrandt; Dirk Habich; Wolfgang Lehner
Lightweight integer compression algorithms are frequently applied in in-memory database systems to tackle the growing gap between processor speed and main memory bandwidth. In recent years, the vectorization of basic techniques such as delta coding and null suppression has considerably enlarged the corpus of available algorithms. As a result, today there is a large number of algorithms to choose from
-
Verification of Hierarchical Artifact Systems ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-06-19 Alin Deutsch; Yuliang Li; Victor Vianu
Data-driven workflows, of which IBM’s Business Artifacts are a prime exponent, have been successfully deployed in practice, adopted in industrial standards, and have spawned a rich body of research in academia, focused primarily on static analysis. The present work represents a significant advance on the problem of artifact verification by considering a much richer and more realistic model than in
-
Interactive Mapping Specification with Exemplar Tuples ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-06-19 Angela Bonifati; Ugo Comignani; Emmanuel Coquery; Romuald Thion
While schema mapping specification is a cumbersome task for data curation specialists, it becomes unfeasible for non-expert users, who are unacquainted with the semantics and languages of the involved transformations. In this article, we present an interactive framework for schema mapping specification suited for non-expert users. The underlying key intuition is to leverage a few exemplar tuples to
-
A Unified Framework for Frequent Sequence Mining with Subsequence Constraints ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-06-19 Kaustubh Beedkar; Rainer Gemulla; Wim Martens
Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single
-
Output-Optimal Massively Parallel Algorithms for Similarity Joins ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-04-08 Xiao Hu; Ke Yi; Yufei Tao
Parallel join algorithms have received much attention in recent years due to the rapid development of massively parallel systems such as MapReduce and Spark. In the database theory community, most efforts have been focused on studying worst-case optimal algorithms. However, the worst-case optimality of these join algorithms relies on the hard instances having very large output sizes. In the case of
-
A Survey of Spatial Crowdsourcing ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-04-08 Srinivasa Raghavendra Bhuvan Gummidi; Xike Xie; Torben Bach Pedersen
Widespread use of advanced mobile devices has led to the emergence of a new class of crowdsourcing called spatial crowdsourcing. Spatial crowdsourcing advances the potential of a crowd to perform tasks related to real-world scenarios involving physical locations, which were not feasible with conventional crowdsourcing methods. The main feature of spatial crowdsourcing is the presence of spatial tasks
-
Inferring Insertion Times and Optimizing Error Penalties in Time-decaying Bloom Filters ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-04-08 Jonathan L. Dautrich; Chinya V. Ravishankar
Current Bloom Filters tend to ignore Bayesian priors as well as a great deal of useful information they hold, compromising the accuracy of their responses. Incorrect responses cause users to incur penalties that are both application- and item-specific, but current Bloom Filters are typically tuned only for static penalties. Such shortcomings are problematic for all Bloom Filter variants, but especially
-
Dependencies for Graphs ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-04-08 Wenfei Fan; Ping Lu
This article proposes a class of dependencies for graphs, referred to as graph entity dependencies (GEDs). A GED is defined as a combination of a graph pattern and an attribute dependency. In a uniform format, GEDs can express graph functional dependencies with constant literals to catch inconsistencies, and keys carrying id literals to identify entities (vertices) in a graph. We revise the chase for
-
Representations and Optimizations for Embedded Parallel Dataflow Languages ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-01-29 Alexander Alexandrov; Georgi Krastev; Volker Markl
Parallel dataflow engines such as Apache Hadoop, Apache Spark, and Apache Flink are an established alternative to relational databases for modern data analysis applications. A characteristic of these systems is a scalable programming model based on distributed collections and parallel transformations expressed by means of second-order functions such as map and reduce. Notable examples are Flink’s DataSet
-
Wander Join and XDB ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-01-29 Feifei Li; Bin Wu; Ke Yi; Zhuoyue Zhao
Joins are expensive, and online aggregation over joins was proposed to mitigate the cost, which offers users a nice and flexible tradeoff between query efficiency and accuracy in a continuous, online fashion. However, the state-of-the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and even needs unrealistic assumptions (e.g., tuples in a table
-
Historic Moments Discovery in Sequence Data ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-01-29 Ran Bai; Wing Kai Hon; Eric Lo; Zhian He; Kenny Zhu
Many emerging applications are based on finding interesting subsequences from sequence data. Finding “prominent streaks,” a set of the longest contiguous subsequences with values all above (or below) a certain threshold, from sequence data is one of that kind that receives much attention. Motivated from real applications, we observe that prominent streaks alone are not insightful enough but require
-
Scalable Analytics on Fast Data ACM Trans. Database Syst. (IF 2.927) Pub Date : 2019-01-29 Andreas Kipf; Varun Pandey; Jan Böttcher; Lucas Braun; Thomas Neumann; Alfons Kemper
Today’s streaming applications demand increasingly high event throughput rates and are often subject to strict latency constraints. To allow for more complex workloads, such as window-based aggregations, streaming systems need to support stateful event processing. This introduces new challenges for streaming engines as the state needs to be maintained in a consistent and durable manner and simultaneously
-
Parallelizing Sequential Graph Computations ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-12-16 Wenfei Fan; Wenyuan Yu; Jingbo Xu; Jingren Zhou; Xiaojian Luo; Qiang Yin; Ping Lu; Yang Cao; Ruiqi Xu
This article presents GRAPE, a parallel GRAPh Engine for graph computations. GRAPE differs from prior systems in its ability to parallelize existing sequential graph algorithms as a whole, without the need for recasting the entire algorithm into a new model. Underlying GRAPE are a simple programming model and a principled approach based on fixpoint computation that starts with partial evaluation and
-
MacroBase ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-12-16 Firas Abuzaid; Peter Bailis; Jialin Ding; Edward Gan; Samuel Madden; Deepak Narayanan; Kexin Rong; Sahaana Suri
As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver
-
Learning From Query-Answers ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-12-16 Niccolò Meneghetti; Oliver Kennedy; Wolfgang Gatterbauer
Tuple-independent and disjoint-independent probabilistic databases (TI- and DI-PDBs) represent uncertain data in a factorized form as a product of independent random variables that represent either tuples (TI-PDBs) or sets of tuples (DI-PDBs). When the user submits a query, the database derives the marginal probabilities of each output-tuple, exploiting the underlying assumptions of statistical independence
-
Optimal Bloom Filters and Adaptive Merging for LSM-Trees ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-12-16 Niv Dayan; Manos Athanassoulis; Stratos Idreos
In this article, we show that key-value stores backed by a log-structured merge-tree (LSM-tree) exhibit an intrinsic tradeoff between lookup cost, update cost, and main memory footprint, yet all existing designs expose a suboptimal and difficult to tune tradeoff among these metrics. We pinpoint the problem to the fact that modern key-value stores suboptimally co-tune the merge policy, the buffer size
-
Dynamic Complexity under Definable Changes ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-11-26 Thomas Schwentick; Nils Vortmeier; Thomas Zeume
In the setting of dynamic complexity, the goal of a dynamic program is to maintain the result of a fixed query for an input database that is subject to changes, possibly using additional auxiliary relations. In other words, a dynamic program updates a materialized view whenever a base relation is changed. The update of query result and auxiliary relations is specified using first-order logic or, equivalently
-
Distributed Joins and Data Placement for Minimal Network Traffic ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-11-26 Orestis Polychroniou; Wangda Zhang; Kenneth A. Ross
Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm
-
A Relational Framework for Classifier Engineering ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-11-26 Benny Kimelfeld; Christopher Ré
In the design of analytical procedures and machine learning solutions, a critical and time-consuming task is that of feature engineering, for which various recipes and tooling approaches have been developed. In this article, we embark on the establishment of database foundations for feature engineering. We propose a formal framework for classification in the context of a relational database. The goal
-
Expressive Languages for Querying the Semantic Web ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-11-26 Marcelo Arenas; Georg Gottlob; Andreas Pieris
The problem of querying RDF data is a central issue for the development of the Semantic Web. The query language SPARQL has become the standard language for querying RDF since its W3C standardization in 2008. However, the 2008 version of this language missed some important functionalities: reasoning capabilities to deal with RDFS and OWL vocabularies, navigational capabilities to exploit the graph structure
-
K-Regret Queries Using Multiplicative Utility Functions ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-09-05 Jianzhong Qi; Fei Zuo; Hanan Samet; Jia Cheng Yao
The k-regret query aims to return a size-k subset S of a database D such that, for any query user that selects a data object from this size-k subset S rather than from database D, her regret ratio is minimized. The regret ratio here is modeled by the relative difference in the optimality between the locally optimal object in S and the globally optimal object in D. The optimality of a data object in
-
Answering FO+MOD Queries under Updates on Bounded Degree Databases ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-09-05 Christoph Berkholz; Jens Keppeler; Nicole Schweikardt
We investigate the query evaluation problem for fixed queries over fully dynamic databases, where tuples can be inserted or deleted. The task is to design a dynamic algorithm that immediately reports the new result of a fixed query after every database update. We consider queries in first-order logic (FO) and its extension with modulo-counting quantifiers (FO+MOD) and show that they can be efficiently
-
Lightweight Monitoring of Distributed Streams ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-09-05 Arnon Lazerson; Daniel Keren; Assaf Schuster
As data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there is infeasible, a common approach is to define local conditions at the distributed nodes, such that—as long as they are maintained—some desirable global condition holds. Previous methods
-
Efficient Evaluation and Static Analysis for Well-Designed Pattern Trees with Projection ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-09-05 Pablo Barceló; Markus Kröll; Reinhard Pichler; Sebastian Skritek
Conjunctive queries (CQs) fail to provide an answer when the pattern described by the query does not exactly match the data. CQs might thus be too restrictive as a querying mechanism when data is semistructured or incomplete. The semantic web therefore provides a formalism—known as (projected) well-designed pattern trees (pWDPTs)—that tackles this problem: pWDPTs allow us to formulate queries that
-
TriAL ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-04-11 Leonid Libkin; Juan L. Reutter; Adrián Soto; Domagoj Vrgoč
Navigational queries over RDF data are viewed as one of the main applications of graph query languages, and yet the standard model of graph databases—essentially labeled graphs—is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natural ones are bound to lose some functionality when used in conjunction with graph query
-
Building Efficient Query Engines in a High-Level Language ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-04-11 Amir Shaikhha; Yannis Klonatos; Christoph Koch
Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain
-
Estimating the Impact of Unknown Unknowns on Aggregate Query Results ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-04-11 Yeounoh Chung; Michael Lind Mortensen; Carsten Binnig; Tim Kraska
It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) Is the integrated data set complete? and (2) What is the impact of any unknown (i.e., unobserved) data on query results? In this work, we develop and analyze techniques to estimate the
-
Bounded Query Rewriting Using Views ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-04-11 Yang Cao; Wenfei Fan; Floris Geerts; Ping Lu
A query Q in a language L has a bounded rewriting using a set of L-definable views if there exists a query Q′ in L such that given any dataset D, Q(D) can be computed by Q′ that accesses only cached views and a small fraction DQ of D. We consider datasets D that satisfy a set of access constraints, which are a combination of simple cardinality constraints and associated indices, such that the size
-
Practical Private Range Search in Depth ACM Trans. Database Syst. (IF 2.927) Pub Date : 2018-04-11 Ioannis Demertzis; Stavros Papadopoulos; Odysseas Papapetrou; Antonios Deligiannakis; Minos Garofalakis; Charalampos Papamanthou
We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on “practical” private range search (mainly in database venues) that attempt to strike a trade-off between efficiency and security. Nevertheless, these
-
Declarative Probabilistic Programming with Datalog ACM Trans. Database Syst. (IF 2.927) Pub Date : 2017-11-13 Vince BáRány; Balder Ten Cate; Benny Kimelfeld; Dan Olteanu; Zografoula Vagena
Probabilistic programming languages are used for developing statistical models. They typically consist of two components: a specification of a stochastic process (the prior) and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence
-
EmptyHeaded ACM Trans. Database Syst. (IF 2.927) Pub Date : 2017-11-13 Christopher R. Aberger; Andrew Lamb; Susan Tu; Andres Nötzli; Kunle Olukotun; Christopher Ré
There are two types of high-performance graph processing engines: lowand high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail)
-
Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation ACM Trans. Database Syst. (IF 2.927) Pub Date : 2017-11-13 Dario Colazzo; Giorgio Ghelli; Carlo Sartiani
Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving (shuffle) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios. In this article, we study the problem of membership
-
PrivBayes ACM Trans. Database Syst. (IF 2.927) Pub Date : 2017-11-13 Jun Zhang; Graham Cormode; Cecilia M. Procopiuc; Divesh Srivastava; Xiaokui Xiao
Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional
-
Blazes ACM Trans. Database Syst. (IF 2.927) Pub Date : 2017-11-13 Peter Alvaro; Neil Conway; Joseph M. Hellerstein; David Maier
Distributed consistency is perhaps the most-discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to
Contents have been reproduced by permission of the publishers.