• arXiv.cs.DB Pub Date : 2020-04-07
Salman Ahmed Shaikh; Komal Mariam; Hiroyuki Kitagawa; Kyoung-Sook Kim

Apache Flink is an open-source system for the scalable processing of batch and streaming data. The Flink does not natively support the efficient processing of spatial data streams, which is the requirement of many applications dealing with the spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, GeoMesa and Parallel Secondo do not support

更新日期：2020-04-08
• arXiv.cs.DB Pub Date : 2020-04-06
Shivam Srivastava; Prithviraj Sen; Berthold Reinwald

Sparse and irregularly sampled multivariate time series are common in clinical, climate, financial and many other domains. Most recent approaches focus on classification, regression or forecasting tasks on such data. In forecasting, it is necessary to not only forecast the right value but also to forecast when that value will occur in the irregular time series. In this work, we present an approach

更新日期：2020-04-08
• arXiv.cs.DB Pub Date : 2020-04-07
Aoqian Zhang; Shaoxu Song; Yu Sun; Jianmin Wang

Missing numerical values are prevalent, e.g., owing to unreliable sensor reading, collection and transmission among heterogeneous sources. Unlike categorized data imputation over a limited domain, the numerical values suffer from two issues: (1) sparsity problem, the incomplete tuple may not have sufficient complete neighbors sharing the same/similar values for imputation, owing to the (almost) infinite

更新日期：2020-04-08
• arXiv.cs.DB Pub Date : 2020-04-07
Ciro M. Medeiros; Martin A. Musicante; Umberto S. Costa

RDF (Resource Description Framework) is a standard language to represent graph databases. Query languages for RDF databases usually include primitives to support path queries, linking pairs of vertices of the graph that are connected by a path of labels belonging to a given language. Languages such as SPARQL include support for paths defined by regular languages (by means of Regular Expressions). A

更新日期：2020-04-08
• arXiv.cs.DB Pub Date : 2020-04-07
Dimitrios Koutsoukos; Ingo Müller; Renato Marroquín; Gustavo Alonso

Today's data analytics displays an overwhelming diversity along many dimensions: data types, platforms, hardware acceleration, etc. As a result, system design often has to choose between depth and breadth: high efficiency for a narrow set of use cases or generality at a lower performance. In this paper, we pave the way to get the best of both worlds: We present Modularis-an execution layer for data

更新日期：2020-04-08
• arXiv.cs.DB Pub Date : 2017-10-03
Jose Picado; Arash Termehchy; Sudhanshu Pathak; Alan Fern; Praveen Ilango; Yunqiao Cai

Relational databases are valuable resources for learning novel and interesting relations and concepts. In order to constraint the search through the large space of candidate definitions, users must tune the algorithm by specifying a language bias. Unfortunately, specifying the language bias is done via trial and error and is guided by the expert's intuitions. We propose AutoBias, a system that leverages

更新日期：2020-04-08
• arXiv.cs.DB Pub Date : 2019-09-27
Lhouari Nourine; Jean Marc Petit

Incomplete information allow to deal with data with errors, uncertainty or inconsistencies and have been studied in different application areas such as query answering or data integration. In this paper, we investigate classical functional dependencies in presence of incomplete information. To do so, we associate each attribute with a comparability function which maps every pair of domain values to

更新日期：2020-04-06
• arXiv.cs.DB Pub Date : 2020-04-02
Guanyu Feng; Zixuan Ma; Daixuan Li; Xiaowei Zhu; Yanzheng Cai; Wentao Han; Wenguang Chen

Graphs in the real world are constantly changing and of large scale. In processing these evolving graphs, the combination of update workloads (updating vertices and edges in a streaming manner) and analytical (performing graph algorithms incrementally) workloads is ubiquitous. Throughput, latency, and granularity are three key requirements in processing evolving graphs with such combined workloads

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2020-04-01
Benjamin D. Killeen; Jie Ying Wu; Kinjal Shah; Anna Zapaishchykova; Philipp Nikutta; Aniruddha Tamhane; Shreya Chakraborty; Jinchi Wei; Tiger Gao; Mareike Thies; Mathias Unberath

As the coronavirus disease 2019 (COVID-19) becomes a global pandemic, policy makers must enact interventions to stop its spread. Data driven approaches might supply information to support the implementation of mitigation and suppression strategies. To facilitate research in this direction, we present a machine-readable dataset that aggregates relevant data from governmental, journalistic, and academic

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2020-04-02
Daniel Kang; Edward Gan; Peter Bailis; Tatsunori Hashimoto; Matei Zaharia

Due to the falling costs of data acquisition and storage, researchers and industry analysts often want to find all instances of rare events in large datasets. For instance, scientists can cheaply capture thousands of hours of video, but are limited by the need to manually inspect all the video to identify relevant objects and events. To reduce this cost, recent work proposes to use cheap proxy models

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2020-04-02
Jongik Kim

In this paper, we study the problem of graph similarity search with graph edit distance (GED) constraints. Due to the NP-hardness of GED computation, existing solutions to this problem adopt the filtering-and-verification framework with a main focus on the filtering phase to generate a small number of candidate graphs. However, they have a limitation that the number of candidates grows extremely rapidly

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2019-10-20
Maciej Besta; Emanuel Peter; Robert Gerstenberger; Marc Fischer; Michał Podstawski; Claude Barthels; Gustavo Alonso; Torsten Hoefler

Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2020-03-31
Amine Mhedhbi; Pranjal Gupta; Shahid Khaliq; Semih Salihoglu

Graph database management systems (GDBMSs) are highly optimized to perform very fast joins of vertices by indexing the neighbourhoods of vertices in adjacency list indexes. However, existing GDBMSs have system-specific and fixed adjacency list index structures, which makes each system highly efficient on only a fixed set of workloads. We describe a highly flexible and lightweight indexing sub-system

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2016-12-14
Yike Liu; Tara Safavi; Abhilash Dighe; Danai Koutra

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing

更新日期：2020-04-03
• arXiv.cs.DB Pub Date : 2020-03-30
Iovka Boneva; Jose Lozano; Sławek Staworko

We investigate the data exchange from relational databases to RDF graphs inspired by R2RML with the addition of target shape schemas. We study the problems of consistency i.e., checking that every source instance admits a solution, and certain query answering i.e., finding answers present in every solution. We identify the class of constructive relational to RDF data exchange that uses IRI constructors

更新日期：2020-04-01
• arXiv.cs.DB Pub Date : 2020-03-31
Aiping Xiong; Tianhao Wang; Ninghui Li; Somesh Jha

Differential privacy protects an individual's privacy by perturbing data on an aggregated level (DP) or individual level (LDP). We report four online human-subject experiments investigating the effects of using different approaches to communicate differential privacy techniques to laypersons in a health app data collection setting. Experiments 1 and 2 investigated participants' data disclosure decisions

更新日期：2020-04-01
• arXiv.cs.DB Pub Date : 2020-03-31
Aaron Feng; Shuwei Chen; Yuliang Li; Hiroshi Matsuda; Hidekazu Tamaki; Wang-Chiew Tan

Existing e-commerce search engines typically support search only over objective attributes, such as price and locations, leaving the more desirable subjective attributes, such as romantic vibe and worklife balance unsearchable. We found that this is also the case for Recruit Group, which operates a wide range of online booking and search services, including jobs, travel, housing, bridal, dining, beauty

更新日期：2020-04-01
• arXiv.cs.DB Pub Date : 2020-03-31
Xinyue Wang; Zhiwu Xie

The WARC file format is widely used by web archives to preserve collected web content for future use. With the rapid growth of web archives and the increasing interest to reuse these archives as big data sources for statistical and analytical research, the speed to turn these data into insights becomes critical. In this paper we show that the WARC format carries significant performance penalties for

更新日期：2020-04-01
• arXiv.cs.DB Pub Date : 2019-11-05
Edward E. Seabolt; Gowri Nayar; Harsha Krishnareddy; Akshay Agarwal; Kristen L. Beck; Ignacio Terrizzano; Eser Kandogan; Mary Roth; Vandana Mukherjee; James H. Kaufman

The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics. With the increasing availability of genomic data, traditional bioinformatic tools require substantial computational time and the creation of ever-larger indices each time a researcher seeks to gain insight from the data. To address these challenges

更新日期：2020-04-01
• arXiv.cs.DB Pub Date : 2020-03-27
Martin Grohe

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2020-03-29
Hao Xu; Paulo Valente Klainea; Oluwakayode Oniretia; Bin Caob; Muhammad Imrana; Lei Zhang

The sixth generation (6G) network must provide performance superior to previous generations in order to meet the requirements of emerging services and applications, such as multi-gigabit transmission rate, even higher reliability, sub 1 millisecond latency and ubiquitous connection for Internet of Everything. However, with the scarcity of spectrum resources, efficient resource management and sharing

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2020-03-29
Daniel Garijo; María Poveda-Villalón

With the adoption of Semantic Web technologies, an increasing number of vocabularies and ontologies have been developed in different domains, ranging from Biology to Agronomy or Geosciences. However, many of these ontologies are still difficult to find, access and understand by researchers due to a lack of documentation, URI resolving issues, versioning problems, etc. In this chapter we describe guidelines

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2020-03-29
Jinfei Liu

Data-driven machine learning (ML) has witnessed great successes across a variety of application domains. Since ML model training are crucially relied on a large amount of data, there is a growing demand for high quality data to be collected for ML model training. However, from data owners' perspective, it is risky for them to contribute their data. To incentivize data contribution, it would be ideal

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2020-03-29
Venkata Vamsikrishna Meduri; Lucian Popa; Prithviraj Sen; Mohamed Sarwat

Entity Matching (EM) is a core data cleaning task, aiming to identify different mentions of the same real-world entity. Active learning is one way to address the challenge of scarce labeled data in practice, by dynamically collecting the necessary examples to be labeled by an Oracle and refining the learned model (classifier) upon them. In this paper, we build a unified active learning benchmark framework

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2019-06-24
Mahmoud Abo Khamis; Phokion G. Kolaitis; Hung Q. Ngo; Dan Suciu

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2019-12-01
Yang Li

The area query, to find all elements contained in a specified area from a certain set of spatial objects, is a very important spatial query widely required in various fields. A number of approaches have been proposed to implement this query, the best known of which is to obtain a rough candidate set through spatial indexes and then refine the candidates through geometric validations to get the final

更新日期：2020-03-31
• arXiv.cs.DB Pub Date : 2020-03-27
Aoqian Zhang; Shaoxu Song; Jianmin Wang; Philip S. Yu

Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time

更新日期：2020-03-30
• arXiv.cs.DB Pub Date : 2020-03-25
Giovanni Micale; Vincenzo Bonnici; Alfredo Ferro; Dennis Shasha; Rosalba Giugno; Alfredo Pulvirenti

The Subgraph Matching (SM) problem consists of finding all the embeddings of a given small graph, called the query, into a large graph, called the target. The SM problem has been widely studied for simple graphs, i.e. graphs where there is exactly one edge between two nodes and nodes have single labels, but few approaches have been devised for labeled multigraphs, i.e. graphs having possibly multiple

更新日期：2020-03-27
• arXiv.cs.DB Pub Date : 2020-03-25
Sheng Wang; Zhifeng Bao; J. Shane Culpepper; Gao Cong

Recent advances in sensor and mobile devices have enabled an unprecedented increase in the availability and collection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyze the data being produced. In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory

更新日期：2020-03-27
• arXiv.cs.DB Pub Date : 2020-03-25
Chuan Lei; Rana Alotaibi; Abdul Quamar; Vasilis Efthymiou; Fatma Özcan

Enterprises are creating domain-specific knowledge graphs by curating and integrating their business data from multiple sources. The data in these knowledge graphs can be described using ontologies, which provide a semantic abstraction to define the content in terms of the entities and the relationships of the domain. The rich semantic relationships in an ontology contain a variety of opportunities

更新日期：2020-03-27
• arXiv.cs.DB Pub Date : 2016-06-20
Yanhong A. Liu; Scott D. Stoller

Logic rules and inference are fundamental in computer science and have been studied extensively. However, prior semantics of logic languages can have subtle implications and can disagree significantly, on even very simple programs, including in attempting to solve the well-known Russell's paradox. These semantics are often non-intuitive and hard-to-understand when unrestricted negation is used in recursion

更新日期：2020-03-27
• arXiv.cs.DB Pub Date : 2019-12-23
Marcelo Arenas; Pablo Barceló; Mikaël Monet

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query $q$, we consider the following two problems: given as input an incomplete database $D$, (a) return

更新日期：2020-03-27
• arXiv.cs.DB Pub Date : 2020-03-24
Piero Giacomelli

In this paper we will describe a new approach on the well-known suffix-array algorithm using Big Table Data Technology. We will demonstrate how it is possible to refactor a well-known algorithm coupled by taking advantage of an high-performance distributed datastore, to illustrate the advantages of using datastore cloud related technology for storing large text sequences and retrieving them. A case

更新日期：2020-03-26
• arXiv.cs.DB Pub Date : 2020-03-19
Han Liu; Shantao Liu

EQL, also named as Extremely Simple Query Language, can be widely used in the field of knowledge graph, precise search, strong artificial intelligence, database, smart speaker ,patent search and other fields. EQL adopt the principle of minimalism in design and pursues simplicity and easy to learn so that everyone can master it quickly. EQL language and lambda calculus are interconvertible, that reveals

更新日期：2020-03-26
• arXiv.cs.DB Pub Date : 2018-01-30
Alex Galakatos; Michael Markovitch; Carsten Binnig; Rodrigo Fonseca; Tim Kraska

Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can often become prohibitive and consume valuable system resources. In fact, a recent study showed that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a

更新日期：2020-03-26
• arXiv.cs.DB Pub Date : 2020-03-20
Simeon Krastnikov; Florian Kerschbaum; Douglas Stebila

A major algorithmic challenge in designing applications intended for secure remote execution is ensuring that they are oblivious to their inputs, in the sense that their memory access patterns do not leak sensitive information to the server. This problem is particularly relevant to cloud databases that wish to allow queries over the client's encrypted data. One of the major obstacles to such a goal

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-20
Jonathan J. Harris; Ching-Hua Chen; Mohammed J. Zaki

Whereas it has become easier for individuals to track their personal health data (e.g., heart rate, step count, food log), there is still a wide chasm between the collection of data and the generation of meaningful explanations to help users better understand what their data means to them. With an increased comprehension of their data, users will be able to act upon the newfound information and work

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-21
Shi Li; Sai Vikneshwar Mani Jayaraman; Atri Rudra

In this paper, we initiate a theoretical study of what we call the join covering problem. We are given a natural join query instance $Q$ on $n$ attributes and $m$ relations $(R_i)_{i \in [m]}$. Let $J_{Q} = \ \Join_{i=1}^m R_i$ denote the join output of $Q$. In addition to $Q$, we are given a parameter $\Delta: 1\le \Delta\le n$ and our goal is to compute the smallest subset \$\mathcal{T}_{Q, \Delta}

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-21
Anna Fariha; Suman Nath; Alexandra Meliou

Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-21
Antonis Kontaxakis; Nikos Giatrakos; Antonios Deligiannakis

In this work, we detail the design and structure of a Synopses Data Engine (SDE) which combines the virtues of parallel processing and stream summarization towards delivering interactive analytics at extreme scale. Our SDE is built on top of Apache Flink and implements a synopsis-as-a-service paradigm. In that it achieves (a) concurrently maintaining thousands of synopses of various types for thousands

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-21
Nadiia Chepurko; Ryan Marcus; Emanuel Zgraggen; Raul Castro Fernandez; Tim Kraska; David Karger

Automatic machine learning (\AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmentation

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-22
Naoto Ohsaka

Influence maximization is among the most fundamental algorithmic problems in social influence analysis. Over the last decade, a great effort has been devoted to developing efficient algorithms for influence maximization, so that identifying the best'' algorithm has become a demanding task. In SIGMOD'17, Arora, Galhotra, and Ranu reported benchmark results on eleven existing algorithms and demonstrated

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-21
Leonidas Fegaras; Md Hasanuzzaman Noor

Large volumes of data generated by scientific experiments and simulations come in the form of arrays, while programs that analyze these data are frequently expressed in terms of array operations in an imperative, loop-based language. But, as datasets grow larger, new frameworks in distributed Big Data analytics have become essential tools to large-scale scientific computing. Scientists, who are typically

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-23
Pingcheng Ruan; Dumitrel Loghin; Quang-Trung Ta; Meihui Zhang; Gang Chen; Beng Chin Ooi

Smart contracts have enabled blockchain systems to evolve from simple cryptocurrency platforms, such as Bitcoin, to general transactional systems, such as Ethereum. Catering for emerging business requirements, a new architecture called execute-order-validate has been proposed in Hyperledger Fabric to support parallel transactions and improve the blockchain's throughput. However, this new architecture

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-23
Jinfei Liu

Shapley value is a concept in cooperative game theory for measuring the contribution of each participant, which was named in honor of Lloyd Shapley. Shapley value has been recently applied in data marketplaces for compensation allocation based on their contribution to the models. Shapley value is the only value division scheme used for compensation allocation that meets three desirable criteria: group

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2019-03-31

A private data federation enables data owners to pool their information for querying without disclosing their secret tuples to one another. Here, a client queries the union of the records of all data owners. The data owners work together to answer the query using privacy-preserving algorithms that prevent them from learning unauthorized information about the inputs of their peers. Only the client,

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2019-11-13
Nikolaos Tziavelis; Deepak Ajwani; Wolfgang Gatterbauer; Mirek Riedewald; Xiaofeng Yang

We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-16
Michael A. Georgiou; Aristodemos Paphitis; Michael Sirivianos; Herodotos Herodotou

With the advent of the Internet and Internet-connected devices, modern business applications can experience rapid increases as well as variability in transactional workloads. Database replication has been employed to scale performance and improve availability of relational databases but past approaches have suffered from various issues including limited scalability, performance versus consistency tradeoffs

更新日期：2020-03-24
• arXiv.cs.DB Pub Date : 2020-03-16
Julien Romero; Nicoleta Preda; Antoine Amarilli; Fabian Suchanek

A view with a binding pattern is a parameterized query on a database. Such views are used, e.g., to model Web services. To answer a query on such views, the views have to be orchestrated together in execution plans. We show how queries can be rewritten into equivalent execution plans, which are guaranteed to deliver the same results as the query on all databases. We provide a correct and complete algorithm

更新日期：2020-03-20
• arXiv.cs.DB Pub Date : 2020-03-18
Zhe Li; Tsz Nam Chan; Man Lung Yiu; Christian S. Jensen

Range aggregate queries find frequent application in data analytics. In some use cases, approximate results are preferred over accurate results if they can be computed rapidly and satisfy approximation guarantees. Inspired by a recent indexing approach, we provide means of representing a discrete point data set by continuous functions that can then serve as compact index structures. More specifically

更新日期：2020-03-19
• arXiv.cs.DB Pub Date : 2020-03-18
Teemu Lehto; Markku Hinkka

A common challenge for improving business processes in large organizations is that business people in charge of the operations are lacking a fact-based understanding of the execution details, process variants, and exceptions taking place in business operations. While existing process mining methodologies can discover these details based on event logs, it is challenging to communicate the process mining

更新日期：2020-03-19
• arXiv.cs.DB Pub Date : 2020-03-17
Md Amiruzzaman; Suphanut Jamonnak

This paper presents a new application for multi-dimensional Skyline query. The idea presented in this paper can be used to find best shopping malls based on users requirements. A web-based application was used to simulate the problem and proposed solution. Also, a mathematical definition was developed to define the problem and show how multi-dimensional Skyline query can be used to solve complex problems

更新日期：2020-03-19
• arXiv.cs.DB Pub Date : 2020-03-16
Christopher Baik; Zhongjun Jin; Michael Cafarella; H. V. Jagadish

Querying a relational database is difficult because it requires users to know both the SQL language and be familiar with the schema. On the other hand, many users possess enough domain familiarity or expertise to describe their desired queries by alternative means. For such users, two major alternatives to writing SQL are natural language interfaces (NLIs) and programming-by-example (PBE). Both of

更新日期：2020-03-18
• arXiv.cs.DB Pub Date : 2020-03-17
Jakob Blomer; Philippe Canal; Axel Naumann; Danilo Piparo

The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts ("branches") that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In

更新日期：2020-03-18
• arXiv.cs.DB Pub Date : 2020-03-10
Zequn Sun; Qingheng Zhang; Wei Hu; Chengming Wang; Muhao Chen; Farahnaz Akrami; Chengkai Li

Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging

更新日期：2020-03-18
• arXiv.cs.DB Pub Date : 2020-03-17
Pablo Barcelo; Cristina Feier; Carsten Lutz; Andreas Pieris

In ontology-mediated querying, description logic (DL) ontologies are used to enrich incomplete data with domain knowledge which results in more complete answers to queries. However, the evaluation of ontology-mediated queries (OMQs) over relational databases is computationally hard. This raises the question when OMQ evaluation is efficient, in the sense of being tractable in combined complexity or

更新日期：2020-03-18
• arXiv.cs.DB Pub Date : 2019-09-02
Mengwei Xu; Xiwen Zhang; Yunxin Liu; Xuanzhe Liu; Felix Xiaozhu Lin

Today's analytics-powered cameras are still limited to urban, residential areas where power/network resources abound. To expand them to more diverse environments, especially those are off-grid and highly network-constrained, the cameras shall be autonomous'', i.e., independent from external power supply and compute infrastructure. Can autonomous cameras do any useful analytics? Our response is iCam

更新日期：2020-03-18
• arXiv.cs.DB Pub Date : 2019-10-02
Supreeth Shastri; Vinay Banakar; Melissa Wasserman; Arun Kumar; Vijay Chidambaram

The General Data Protection Regulation (GDPR) provides new rights and protections to European people concerning their personal data. We analyze GDPR from a systems perspective, translating its legal articles into a set of capabilities and characteristics that compliant systems must support. Our analysis reveals the phenomenon of metadata explosion, wherein large quantities of metadata needs to be stored

更新日期：2020-03-18
• arXiv.cs.DB Pub Date : 2020-03-12
Vikram Sreekanti; Chenggang Wu; Saurav Chhatrapati; Joseph E. Gonzalez; Joseph M. Hellerstein; Jose M. Faleiro

Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would

更新日期：2020-03-16
• arXiv.cs.DB Pub Date : 2020-03-13
Omid Jafari; Parth Nagarkar; Johnathan Montaño

Many large multimedia applications require efficient processing of nearest neighbor queries. Often, multimedia data are represented as a collection of important high-dimensional feature vectors. Locality Sensitive Hashing (LSH) is a very popular approximate technique for finding nearest neighbors in high-dimensional spaces. In order to find top-k similar multimedia objects, existing LSH techniques

更新日期：2020-03-16
Contents have been reproduced by permission of the publishers.

down
wechat
bug