当前期刊: arXiv - CS - Databases Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Alignment Approximation for Process Trees
    arXiv.cs.DB Pub Date : 2020-09-29
    Daniel Schuster; Sebastiaan van Zelst; Wil M. P. van der Aalst

    Comparing observed behavior (event data generated during process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard technique for calculating conformance checking statistics. However, the calculation of alignments is computationally complex since a shortest path problem must be solved on a state space which grows

    更新日期:2020-09-30
  • Database Repairing with Soft Functional Dependencies
    arXiv.cs.DB Pub Date : 2020-09-29
    Nofar Carmeli; Martin Grohe; Benny Kimelfeld; Ester Livshits; Muhammad Tibi

    A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost)

    更新日期:2020-09-30
  • The Shapley Value of Inconsistency Measures for Functional Dependencies
    arXiv.cs.DB Pub Date : 2020-09-29
    Ester Livshits; Benny Kimelfeld

    Quantifying the inconsistency of a database is motivated by various goals including reliability estimation for new datasets and progress indication in data cleaning. Another goal is to attribute to individual tuples a level of responsibility to the overall inconsistency, and thereby prioritize tuples in the explanation or inspection of dirt. Therefore, inconsistency quantification and attribution have

    更新日期:2020-09-30
  • In-Order Sliding-Window Aggregation in Worst-Case Constant Time
    arXiv.cs.DB Pub Date : 2020-09-29
    Kanat Tangwongsan; Martin Hirzel; Scott Schneider

    Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be expressed as binary operators that are associative but not necessarily commutative nor invertible. Non-invertible operators, however, are difficult to support efficiently. In a 2017 conference paper, we introduced DABA, the first algorithm

    更新日期:2020-09-30
  • Modelling service-oriented systems and cloud services with Heraklit
    arXiv.cs.DB Pub Date : 2020-09-29
    Peter Fettke; Wolfgang Reisig

    Modern and next generation digital infrastructures are technically based on service oriented structures, cloud services, and other architectures that compose large systems from smaller subsystems. The composition of subsystems is particularly challenging, as the subsystems themselves may be represented in different languages, modelling methods, etc. It is quite challenging to precisely conceive, understand

    更新日期:2020-09-30
  • Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version)
    arXiv.cs.DB Pub Date : 2020-09-28
    Zuozhi Wang; Kai Zeng; Botong Huang; Wei Chen; Xiaozong Cui; Bo Wang; Ji Liu; Liya Fan; Dachuan Qu; Zhenyu Hou; Tao Guan; Chen Li; Jingren Zhou

    Incremental processing is widely-adopted in many applications, ranging from incremental view maintenance, stream computing, to recently emerging progressive data warehouse and intermittent query processing. Despite many algorithms developed on this topic, none of them can produce an incremental plan that always achieves the best performance, since the optimal plan is data dependent. In this paper,

    更新日期:2020-09-30
  • GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination
    arXiv.cs.DB Pub Date : 2020-09-23
    Tianhui Shi; Mingshu Zhai; Yi Xu; Jidong Zhai

    Graph pattern matching, which aims to discover structural patterns in graphs, is considered one of the most fundamental graph mining problems in many real applications. Despite previous efforts, existing systems face two main challenges. First, inherent symmetry existing in patterns can introduce a large amount of redundant computation. Second, different matching orders for a pattern have significant

    更新日期:2020-09-30
  • CAT STREET: Chronicle Archive of Tokyo Street-fashion
    arXiv.cs.DB Pub Date : 2020-09-28
    Satoshi Takahashi; Keiko Yamaguchi; Asuka Watanabe

    The analysis of daily life fashion trends can help us understand our societies and human cultures profoundly. However, no appropriate database exists that includes images illustrating what people wore in their daily lives over an extended period. In this study, we propose a new fashion image archive, Chronicle Archive of Tokyo Street-fashion (CAT STREET), to shed light on daily life fashion trends

    更新日期:2020-09-29
  • Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation
    arXiv.cs.DB Pub Date : 2020-09-27
    Olga Poppe; Tayo Amuneke; Dalitso Banda; Aritra De; Ari Green; Manon Knoertzer; Ehi Nosakhare; Karthik Rajendran; Deepak Shankargouda; Meina Wang; Alan Au; Carlo Curino; Qun Guo; Alekh Jindal; Ajay Kalhan; Morgan Oslake; Sonia Parchani; Vijay Ramani; Raj Sellappan; Saikat Sen; Sheetal Shrotri; Soundararajan Srinivasan; Ping Xia; Shize Xu; Alicia Yang; Yiwen Zhu

    Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data

    更新日期:2020-09-29
  • A Big Data Lake for Multilevel Streaming Analytics
    arXiv.cs.DB Pub Date : 2020-09-25
    Ruoran Liu; Haruna Isah; Farhana Zulkernine

    Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely posed by the availability of streaming data at high velocity from various sources in multiple formats. The changes in data paradigm have led to the emergence of new

    更新日期:2020-09-29
  • Towards a Natural Language Query Processing System
    arXiv.cs.DB Pub Date : 2020-09-25
    Chantal Montgomery; Haruna Isah; Farhana Zulkernine

    Tackling the information retrieval gap between non-technical database end-users and those with the knowledge of formal query languages has been an interesting area of data management and analytics research. The use of natural language interfaces to query information from databases offers the opportunity to bridge the communication challenges between end-users and systems that use formal query languages

    更新日期:2020-09-29
  • Spatial-Temporal Demand Forecasting and Competitive Supply via Graph Convolutional Networks
    arXiv.cs.DB Pub Date : 2020-09-24
    Bolong Zheng; Qi Hu; Lingfeng Ming; Jilin Hu; Lu Chen; Kai Zheng; Christian S. Jensen

    We consider a setting with an evolving set of requests for transportation from an origin to a destination before a deadline and a set of agents capable of servicing the requests. In this setting, an assignment authority is to assign agents to requests such that the average idle time of the agents is minimized. An example is the scheduling of taxis (agents) to meet incoming requests for trips while

    更新日期:2020-09-28
  • Effective and Efficient Variable-Length Data Series Analytics
    arXiv.cs.DB Pub Date : 2020-09-22
    Michele Linardi

    In the last twenty years, data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different mining problems work by means of similarity search. In this regard, all the proposed solutions require the prior knowledge of the series length on which similarity search is performed. In

    更新日期:2020-09-25
  • An Analysis of Concurrency Control Protocols for In-Memory Databases with CCBench (Extended Version)
    arXiv.cs.DB Pub Date : 2020-09-24
    Takayuki Tanabe; Takashi Hoshino; Hideyuki Kawashima; Osamu Tatebe

    This paper presents yet another concurrency control analysis platform, CCBench. CCBench supports seven protocols (Silo, TicToc, MOCC, Cicada, SI, SI with latch-free SSN, 2PL) and seven versatile optimization methods and enables the configuration of seven workload parameters. We analyzed the protocols and optimization methods using various workload parameters and a thread count of 224. Previous studies

    更新日期:2020-09-25
  • Compressed Key Sort and Fast Index Reconstruction
    arXiv.cs.DB Pub Date : 2020-09-24
    Yongsik Kwon; Cheol Ryu; Sang Kyun Cha; Arthur H. Lee; Kunsoo Park; Bongki Moon

    In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While the actual compression ratio may vary depending on the characteristics of datasets (an average of 2.76 to one compression ratio was observed in our experiments)

    更新日期:2020-09-25
  • Algorithms for a Topology-aware Massively Parallel Computation Model
    arXiv.cs.DB Pub Date : 2020-09-24
    Xiao Hu; Paraschos Koutris; Spyros Blanas

    Most of the prior work in massively parallel data processing assumes homogeneity, i.e., every computing unit has the same computational capability, and can communicate with every other unit with the same latency and bandwidth. However, this strong assumption of a uniform topology rarely holds in practical settings, where computing units are connected through complex networks. To address this issue

    更新日期:2020-09-25
  • Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases
    arXiv.cs.DB Pub Date : 2020-09-24
    Gerhard Weikum; Luna Dong; Simon Razniewski; Fabian Suchanek

    Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret

    更新日期:2020-09-25
  • Segmented Pairwise Distance for Time Series with Large Discontinuities
    arXiv.cs.DB Pub Date : 2020-09-23
    Jiabo He; Sarah Erfani; Sudanthi Wijewickrema; Stephen O'Leary; Kotagiri Ramamohanarao

    Time series with large discontinuities are common in many scenarios. However, existing distance-based algorithms (e.g., DTW and its derivative algorithms) may perform poorly in measuring distances between these time series pairs. In this paper, we propose the segmented pairwise distance (SPD) algorithm to measure distances between time series with large discontinuities. SPD is orthogonal to distance-based

    更新日期:2020-09-24
  • Supervised Ontology and Instance Matching with MELT
    arXiv.cs.DB Pub Date : 2020-09-20
    Sven Hertling; Jan Portisch; Heiko Paulheim

    In this paper, we present MELT-ML, a machine learning extension to the Matching and EvaLuation Toolkit (MELT) which facilitates the application of supervised learning for ontology and instance matching. Our contributions are twofold: We present an open source machine learning extension to the matching toolkit as well as two supervised learning use cases demonstrating the capabilities of the new extension

    更新日期:2020-09-24
  • Towards a Flexible Embedding Learning Framework
    arXiv.cs.DB Pub Date : 2020-09-23
    Chin-Chia Michael Yeh; Dhruv Gelda; Zhongfang Zhuang; Yan Zheng; Liang Gou; Wei Zhang

    Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these methods have pre-determined assumptions on the type of semantics captured by the learned embeddings, and the assumptions may not well align with specific downstream

    更新日期:2020-09-24
  • There is No Such Thing as an "Index"! or: The next 500 Indexing Papers
    arXiv.cs.DB Pub Date : 2020-09-22
    Jens Dittrich; Joris Nix; Christian Schön

    Index structures are a building block of query processing and computer science in general. Since the dawn of computer technology there have been index structures. And since then, a myriad of index structures are being invented and published each and every year. In this paper we argue that the very idea of "inventing an index" is a misleading concept in the first place. It is the analogue of "inventing

    更新日期:2020-09-23
  • Efficiently Finding a Maximal Clique Summary via Effective Sampling
    arXiv.cs.DB Pub Date : 2020-09-22
    Xiaofan Li; Rui Zhou; Lu Chen; Chengfei Liu; Qiang He; Yun Yang

    Maximal clique enumeration (MCE) is a fundamental problem in graph theory and is used in many applications, such as social network analysis, bioinformatics, intelligent agent systems, cyber security, etc. Most existing MCE algorithms focus on improving the efficiency rather than reducing the output size. The output unfortunately could consist of a large number of maximal cliques. In this paper, we

    更新日期:2020-09-23
  • Scalable Data Series Subsequence Matching with ULISSE
    arXiv.cs.DB Pub Date : 2020-09-22
    Michele Linardi; Themis Palpanas

    Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data

    更新日期:2020-09-23
  • Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey
    arXiv.cs.DB Pub Date : 2020-09-22
    Waqas Ali; Muhammad Saleem; Bin Yao; Aidan Hogan; Axel-Cyrille Ngonga Ngomo

    The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is significant adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ various mechanisms to implement critical

    更新日期:2020-09-23
  • Bandits Under The Influence (Extended Version)
    arXiv.cs.DB Pub Date : 2020-09-21
    Silviu Maniu; Stratis Ioannidis; Bogdan Cautis

    Recommender systems should adapt to user interests as the latter evolve. A prevalent cause for the evolution of user interests is the influence of their social circle. In general, when the interests are not known, online algorithms that explore the recommendation space while also exploiting observed preferences are preferable. We present online recommendation algorithms rooted in the linear multi-armed

    更新日期:2020-09-23
  • Data mining and time series segmentation via extrema: preliminary investigations
    arXiv.cs.DB Pub Date : 2020-09-02
    Michel Fliess; Cédric Join

    Time series segmentation is one of the many data mining tools. This paper, in French, takes local extrema as perceptually interesting points (PIPs). The blurring of those PIPs by the quick fluctuations around any time series are treated via an additive decomposition theorem, due to Cartier and Perrin, and algebraic estimation techniques, which are already useful in automatic control and signal processing

    更新日期:2020-09-22
  • Selectivity correction with online machine learning
    arXiv.cs.DB Pub Date : 2020-09-21
    Max Halford; Philippe Saint-Pierre; Franck Morvan

    Computer systems are full of heuristic rules which drive the decisions they make. These rules of thumb are designed to work well on average, but ignore specific information about the available context, and are thus sub-optimal. The emerging field of machine learning for systems attempts to learn decision rules with machine learning algorithms. In the database community, many recent proposals have been

    更新日期:2020-09-22
  • Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks
    arXiv.cs.DB Pub Date : 2020-09-21
    Max Halford; Philippe Saint-Pierre; Franck Morvan

    Relational query optimisers rely on cost models to choose between different query execution plans. Selectivity estimates are known to be a crucial input to the cost model. In practice, standard selectivity estimation procedures are prone to large errors. This is mostly because they rely on the so-called attribute value independence and join uniformity assumptions. Therefore, multidimensional methods

    更新日期:2020-09-22
  • Towards application-specific query processing systems
    arXiv.cs.DB Pub Date : 2020-09-21
    Dimitrios VasilasDELYS, SU; Marc ShapiroDELYS, SU; Bradley KingDELYS, SU; Sara HamoudaDELYS, SU

    Database systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise from

    更新日期:2020-09-22
  • TODS: An Automated Time Series Outlier Detection System
    arXiv.cs.DB Pub Date : 2020-09-18
    Kwei-Harng Lai; Daochen Zha; Guanchu Wang; Junjie Xu; Yue Zhao; Devesh Kumar; Yile Chen; Purav Zumkhawaka; Minyang Wan; Diego Martinez; Xia Hu

    We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is an implementation of a function with hyperparameters. TODS currently supports 70 primitives, including data processing, time series processing, feature analysis,

    更新日期:2020-09-22
  • SPChain: Blockchain-based Medical Data Sharing and Privacy-preserving eHealth System
    arXiv.cs.DB Pub Date : 2020-09-21
    Renpeng ZouSchool of Cyber Engineering, Xidian University, Xian, China; Xixiang LvSchool of Cyber Engineering, Xidian University, Xian, China; Jingsong ZhaoSchool of Cyber Engineering, Xidian University, Xian, China

    The development of eHealth systems has brought great convenience to people's life. Researchers have been combining new technologies to make eHealth systems work better for patients. The Blockchain-based eHealth system becomes popular because of its unique distributed tamper-resistant and privacy-preserving features. However, due to the security issues of the blockchain system, there are many security

    更新日期:2020-09-22
  • Consistency, Acyclicity, and Positive Semirings
    arXiv.cs.DB Pub Date : 2020-09-20
    Albert Atserias; Phokion G. Kolaitis

    In several different settings, one comes across situations in which the objects of study are locally consistent but globally inconsistent. Earlier work about probability distributions by Vorob'ev (1962) and about database relations by Beeri, Fagin, Maier, Yannakakis (1983) produced characterizations of when local consistency always implies global consistency. Towards a common generalization of these

    更新日期:2020-09-22
  • SYNC: A Copula based Framework for Generating Synthetic Data from Aggregated Sources
    arXiv.cs.DB Pub Date : 2020-09-20
    Zheng Li; Yue Zhao; Jialin Fu

    A synthetic dataset is a data object that is generated programmatically, and it may be valuable to creating a single dataset from multiple sources when direct collection is difficult or costly. Although it is a fundamental step for many data science tasks, an efficient and standard framework is absent. In this paper, we study a specific synthetic data generation task called downscaling, a procedure

    更新日期:2020-09-22
  • Answering Counting Queries over DL-Lite Ontologies
    arXiv.cs.DB Pub Date : 2020-09-02
    Meghyn BienvenuUB, CNRS, Bordeaux INP, LaBRI; Quentin ManièreUB, CNRS, Bordeaux INP, LaBRI; Michaël ThomazoVALDA

    Ontology-mediated query answering (OMQA) is a promising approach to data access and integration that has been actively studied in the knowledge representation and database communities for more than a decade. The vast majority of work on OMQA focuses on conjunctive queries, whereas more expressive queries that feature counting or other forms of aggregation remain largely unex-plored. In this paper,

    更新日期:2020-09-22
  • SHACL Satisfiability and Containment (Extended Paper)
    arXiv.cs.DB Pub Date : 2020-08-31
    Paolo Pareti; George Konstantinidis; Fabio Mogavero; Timothy J. Norman

    The Shapes Constraint Language (SHACL) is a recent W3C recommendation language for validating RDF data. Specifically, SHACL documents are collections of constraints that enforce particular shapes on an RDF graph. Previous work on the topic has provided theoretical and practical results for the validation problem, but did not consider the standard decision problems of satisfiability and containment

    更新日期:2020-09-22
  • Multi-source Data Mining for e-Learning
    arXiv.cs.DB Pub Date : 2020-09-17
    Julie Bu Daher; Armelle Brun; Anne Boyer

    Data mining is the task of discovering interesting, unexpected or valuable structures in large datasets and transforming them into an understandable structure for further use . Different approaches in the domain of data mining have been proposed, among which pattern mining is the most important one. Pattern mining mining involves extracting interesting frequent patterns from data. Pattern mining has

    更新日期:2020-09-21
  • Extensible Data Skipping
    arXiv.cs.DB Pub Date : 2020-09-17
    Paula Ta-Shma; Guy Khazma; Gal Lushi; Oshrit Feder

    Data skipping reduces I/O for SQL queries by skipping over irrelevant data objects (files) based on their metadata. We extend this notion by allowing developers to define their own data skipping metadata types and indexes using a flexible API. Our framework is the first to natively support data skipping for arbitrary data types (e.g. geospatial, logs) and queries with User Defined Functions (UDFs)

    更新日期:2020-09-20
  • Large-Scale Intelligent Microservices
    arXiv.cs.DB Pub Date : 2020-09-17
    Mark Hamilton; Nick Gonsalves; Christina Lee; Anand Raman; Brendan Walsh; Siddhartha Prasad; Dalitso Banda; Lucy Zhang; Lei Zhang; William T. Freeman

    Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with their own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services

    更新日期:2020-09-20
  • Semantic Property Graph for Scalable Knowledge Graph Analytics
    arXiv.cs.DB Pub Date : 2020-09-16
    Sumit Purohit; Nhuy Van

    Graphs are a natural and fundamental representation of describing the activities, relationships, and evolution of various complex systems. Many domains such as communication, citation, procurement, biology, social media, and transportation can be modeled as a set of entities and their relationships. Resource Description Framework (RDF) and Labeled Property Graph (LPG) are two of the most used data

    更新日期:2020-09-18
  • Faster Property Testers in a Variation of the Bounded Degree Model
    arXiv.cs.DB Pub Date : 2020-09-16
    Isolde Adler; Polly Fahey

    Property testing algorithms are highly efficient algorithms, that come with probabilistic accuracy guarantees. For a property P, the goal is to distinguish inputs that have P from those that are far from having P with high probability correctly, by querying only a small number of local parts of the input. In property testing on graphs, the distance is measured by the number of edge modifications (additions

    更新日期:2020-09-18
  • Grounded Adaptation for Zero-shot Executable Semantic Parsing
    arXiv.cs.DB Pub Date : 2020-09-16
    Victor Zhong; Mike Lewis; Sida I. Wang; Luke Zettlemoyer

    We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e.g. new database schemas). GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycle-consistent examples to adapt the parser. Unlike data-augmentation

    更新日期:2020-09-18
  • CorDEL: A Contrastive Deep Learning Approach for Entity Linkage
    arXiv.cs.DB Pub Date : 2020-09-15
    Zhengyang Wang; Bunyamin Sisman; Hao Wei; Xin Luna Dong; Shuiwang Ji

    Entity linkage (EL) is a critical problem in data cleaning and integration. In the past several decades, EL has typically been done by rule-based systems or traditional machine learning models with hand-curated features, both of which heavily depend on manual human inputs. With the ever-increasing growth of new data, deep learning (DL) based approaches have been proposed to alleviate the high cost

    更新日期:2020-09-16
  • Revealing Secrets in SPARQL Session Level
    arXiv.cs.DB Pub Date : 2020-09-13
    Xinyue Zhang; Meng Wang; Muhammad Saleem; Axel-Cyrille Ngonga Ngomo; Guilin Qi; Haofen Wang

    Based on Semantic Web technologies, knowledge graphs help users to discover information of interest by using live SPARQL services. Answer-seekers often examine intermediate results iteratively and modify SPARQL queries repeatedly in a search session. In this context, understanding user behaviors is critical for effective intention prediction and query optimization. However, these behaviors have not

    更新日期:2020-09-16
  • SPARQL with XQuery-based Filtering
    arXiv.cs.DB Pub Date : 2020-09-14
    Takahiro Komamizu

    Linked Open Data (LOD) has been proliferated over various domains, however, there are still lots of open data in various format other than RDF, a standard data description framework in LOD. These open data can also be connected to entities in LOD when they are associated with URIs. Document-centric XML data are such open data that are connected with entities in LOD as supplemental documents for these

    更新日期:2020-09-15
  • A Simple and Efficient Framework for Identifying Relation-gaps in Ontologies
    arXiv.cs.DB Pub Date : 2020-09-12
    Subhashree S; P Sreenivasa Kumar

    Though many ontologies have huge number of classes, one cannot find a good number of object properties connecting the classes in most of the cases. Adding object properties makes an ontology richer and more applicable for tasks such as Question Answering. In this context, the question of which two classes should be considered for discovering object properties becomes very important. We address the

    更新日期:2020-09-15
  • Answering Multi-Dimensional Range Queries under Local Differential Privacy
    arXiv.cs.DB Pub Date : 2020-09-14
    Jianyu Yang; Tianhao Wang; Ninghui Li; Xiang Cheng; Sen Su

    In this paper, we tackle the problem of answering multi-dimensional range queries under local differential privacy. There are three key technical challenges: capturing the correlations among attributes, avoiding the curse of dimensionality, and dealing with the large domains of attributes. None of the existing approaches satisfactorily deals with all three challenges. Overcoming these three challenges

    更新日期:2020-09-15
  • Utility-Optimized Synthesis of Differentially Private Location Traces
    arXiv.cs.DB Pub Date : 2020-09-14
    Mehmet Emre Gursoy; Vivekanand Rajasekar; Ling Liu

    Differentially private location trace synthesis (DPLTS) has recently emerged as a solution to protect mobile users' privacy while enabling the analysis and sharing of their location traces. A key challenge in DPLTS is to best preserve the utility in location trace datasets, which is non-trivial considering the high dimensionality, complexity and heterogeneity of datasets, as well as the diverse types

    更新日期:2020-09-15
  • An Open-Source Integration of Process Mining Features into the Camunda Workflow Engine: Data Extraction and Challenges
    arXiv.cs.DB Pub Date : 2020-09-14
    Alessandro Berti; Wil van der Aalst; David Zang; Magdalena Lang

    Process mining provides techniques to improve the performance and compliance of operational processes. Although sometimes the term "workflow mining" is used, the application in the context of Workflow Management (WFM) and Business Process Management (BPM) systems is limited. The main reason is that WFM/BPM systems control the process, leaving less room for flexibility and the corresponding deviations

    更新日期:2020-09-15
  • Accelerating COVID-19 Differential Diagnosis with Explainable Ultrasound Image Analysis
    arXiv.cs.DB Pub Date : 2020-09-13
    Jannis Born; Nina Wiedemann; Gabriel Brändle; Charlotte Buhre; Bastian Rieck; Karsten Borgwardt

    Controlling the COVID-19 pandemic largely hinges upon the existence of fast, safe, and highly-available diagnostic tools. Ultrasound, in contrast to CT or X-Ray, has many practical advantages and can serve as a globally-applicable first-line examination technique. We provide the largest publicly available lung ultrasound (US) dataset for COVID-19 consisting of 106 videos from three classes (COVID-19

    更新日期:2020-09-15
  • Finite Horn Monoids and Near-Semirings
    arXiv.cs.DB Pub Date : 2020-09-12
    Christian Antic

    Describing complex objects as the composition of elementary ones is a common strategy in computer science and science in general. This paper contributes to the foundations of knowledge representation and database theory by introducing and studying the sequential composition of propositional Horn theories. Specifically, we show that the notion of composition gives rise to a family of monoids and near-semirings

    更新日期:2020-09-15
  • GeoSPARQL+: Syntax, Semantics and System for Integrated Querying of Graph, Raster and Vector Data -- Technical Report
    arXiv.cs.DB Pub Date : 2020-09-10
    Timo Homburg; Steffen Staab; Daniel Janke

    We introduce an approach to semantically represent and query raster data in a Semantic Web graph. We extend the GeoSPARQL vocabulary and query language to support raster data as a new type of geospatial data. We define new filter functions and illustrate our approach using several use cases on real-world data sets. Finally, we describe a prototypical implementation and validate the feasibility of our

    更新日期:2020-09-11
  • Subscribing to Big Data at Scale
    arXiv.cs.DB Pub Date : 2020-09-10
    Xikui Wang; Michael J. Carey; Vassilis J. Tsotras

    Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus on passively answering queries from users, rather than actively collecting data, processing it, and serving it to users. To satisfy

    更新日期:2020-09-11
  • Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data
    arXiv.cs.DB Pub Date : 2020-09-09
    Daniel Kang; John Guibas; Peter Bailis; Tatsunori Hashimoto; Matei Zaharia

    Unstructured data is now commonly queried by using target deep neural networks (DNNs) to produce structured information, e.g., object types and positions in video. As these target DNNs can be computationally expensive, recent work uses proxy models to produce query-specific proxy scores. These proxy scores are then used in downstream query processing algorithms for improved query execution speeds.

    更新日期:2020-09-11
  • A Survey on Data Pricing: from Economics to Data Science
    arXiv.cs.DB Pub Date : 2020-09-09
    Jian Pei

    It is well recognized that data are invaluable. How can we assess the value of data objectively, systematically and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics, marketing, electronic commerce, data management, data mining and machine learning. In this article, we present a unified, interdisciplinary

    更新日期:2020-09-11
  • Graph-based keyword search in heterogeneous data sources
    arXiv.cs.DB Pub Date : 2020-09-09
    Mhd Yamen HaddadCEDAR; Angelos AnadiotisCEDAR; Yamen MhdCEDAR; Ioana ManolescuCEDAR

    Data journalism is the field of investigative journalism which focuses on digital data by treating them as first-class citizens. Following the trends in human activity, which leaves strong digital traces, data journalism becomes increasingly important. However, as the number and the diversity of data sources increase, heterogeneous data models with different structure, or even no structure at all,

    更新日期:2020-09-10
  • Sequenced Route Query with Semantic Hierarchy
    arXiv.cs.DB Pub Date : 2020-09-08
    Yuya Sasaki; Yoshiharu Ishikawa; Yasuhiro Fujiwara; Makoto Onizuka

    The trip planning query searches for preferred routes starting from a given point through multiple Point-of-Interests (PoI) that match user requirements. Although previous studies have investigated trip planning queries, they lack flexibility for finding routes because all of them output routes that strictly match user requirements. We study trip planning queries that output multiple routes in a flexible

    更新日期:2020-09-10
  • Leam: An Interactive System for In-situ Visual Text Analysis
    arXiv.cs.DB Pub Date : 2020-09-08
    Sajjadur Rahman; Peter Griggs; Çağatay Demiralp

    With the increase in scale and availability of digital text generated on the web, enterprises such as online retailers and aggregators often use text analytics to mine and analyze the data to improve their services and products alike. Text data analysis is an iterative, non-linear process with diverse workflows spanning multiple stages, from data cleaning to visualization. Existing text analytics systems

    更新日期:2020-09-10
  • Conquery: an open source application to analyze high content healthcare data
    arXiv.cs.DB Pub Date : 2020-09-07
    Fabian Kovacs; Max Thonagel; Marion Ludwig; Alexander Albrecht; Hannes Priehn; Manuel Hegner; Dirk Enders; Lennart Hickstein; Maximilian von Knobloch; Anne Rothhardt; Jochen Walker

    Background: Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to considerably improve decision-making processes in the healthcare sector. Most analytical approaches used today are highly time- and resource-consuming. The presented software solution Conquery is an open

    更新日期:2020-09-10
  • A Lightweight Algorithm to Uncover Deep Relationships in Data Tables
    arXiv.cs.DB Pub Date : 2020-09-07
    Jin Cao; Yibo Zhao; Linjun Zhang; Jason Li

    Many data we collect today are in tabular form, with rows as records and columns as attributes associated with each record. Understanding the structural relationship in tabular data can greatly facilitate the data science process. Traditionally, much of this relational information is stored in table schema and maintained by its creators, usually domain experts. In this paper, we develop automated methods

    更新日期:2020-09-10
  • Universal Layout Emulation for Long-Term Database Archival
    arXiv.cs.DB Pub Date : 2020-09-06
    Raja Appuswamy; Vincent Joguin

    Research on alternate media technologies, like film, synthetic DNA, and glass, for long-term data archival has received a lot of attention recently due to the media obsolescence issues faced by contemporary storage media like tape, Hard Disk Drives (HDD), and Solid State Disks (SSD). While researchers have developed novel layout and encoding techniques for archiving databases on these new media types

    更新日期:2020-09-08
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
chemistry
物理学研究前沿热点精选期刊推荐
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷
屿渡论文,编辑服务
阿拉丁试剂right
南昌大学
王辉
南方科技大学
彭小水
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
赵延川
李霄羽
廖矿标
朱守非
试剂库存
down
wechat
bug