当前期刊: The VLDB Journal Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Scalable algorithms for signal reconstruction by leveraging similarity joins
    VLDB J. (IF 1.973) Pub Date : 2019-08-14
    Abolfazl Asudeh, Jees Augustine, Azade Nazi, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das, Divesh Srivastava

    Abstract Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas including network traffic engineering, medical image reconstruction, acoustics, astronomy and many more. Most common approaches for

    更新日期:2020-03-19
  • Joins on high-bandwidth memory: a new level in the memory hierarchy
    VLDB J. (IF 1.973) Pub Date : 2019-07-13
    Constantin Pohl, Kai-Uwe Sattler, Goetz Graefe

    Abstract High-bandwidth memory (HBM) gives an additional opportunity for hardware performance benefits. The high available bandwidth compared to regular DRAM allows execution of many threads in parallel, avoiding memory stalls through many concurrent memory accesses This is especially interesting considering database join algorithms optimized for multicore CPUs, even more when running on a manycore

    更新日期:2020-03-19
  • An analytical study of large SPARQL query logs
    VLDB J. (IF 1.973) Pub Date : 2019-08-02
    Angela Bonifati, Wim Martens, Thomas Timm

    Abstract With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end users has become more and more common in SPARQL endpoints. In this paper, we conduct an in-depth analytical study of the queries formulated by end users and harvested from large and up-to-date structured query logs from a wide variety of RDF data sources. As opposed to previous studies

    更新日期:2020-03-19
  • Efficient compute node-local replication mechanisms for NVRAM-centric data structures
    VLDB J. (IF 1.973) Pub Date : 2019-07-10
    Mikhail Zarubin, Thomas Kissinger, Dirk Habich, Thomas Willhalm, Wolfgang Lehner

    Abstract The long-awaited nonvolatile random-access memory technology NVRAM is finally publicly available on the market and requires significant changes to the architecture of in-memory database systems. Since such hybrid DRAM–NVRAM database systems may be able to keep the primary data solely persistent in the NVRAM, efficient replication mechanisms need to be considered to prevent base data losses

    更新日期:2020-03-19
  • Morton filters: fast, compressed sparse cuckoo filters
    VLDB J. (IF 1.973) Pub Date : 2019-08-06
    Alex D. Breslow, Nuwan S. Jayasena

    Abstract Approximate set membership data structures (ASMDSs) are ubiquitous in computing. They trade a tunable, often small, error rate (\(\epsilon \)) for large space savings. The canonical ASMDS is the Bloom filter, which supports lookups and insertions but not deletions in its simplest form. Cuckoo filters (CFs), a recently proposed class of ASMDSs, add deletion support and often use fewer bits

    更新日期:2020-03-19
  • Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines
    VLDB J. (IF 1.973) Pub Date : 2019-07-16
    Harald Lang, Linnea Passing, Andreas Kipf, Peter Boncz, Thomas Neumann, Alfons Kemper

    Abstract Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the

    更新日期:2020-03-19
  • Snorkel: rapid training data creation with weak supervision
    VLDB J. (IF 1.973) Pub Date : 2019-07-15
    Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

    Abstract Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs

    更新日期:2020-03-19
  • The ubiquity of large graphs and surprising challenges of graph processing: extended survey
    VLDB J. (IF 1.973) Pub Date : 2019-06-29
    Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, M. Tamer Özsu

    Abstract Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We performed an extensive study that consisted of an online survey of 89 users, a review of the mailing lists, source repositories, and white papers of a large suite of graph software products, and in-person

    更新日期:2020-03-19
  • Diversified spatial keyword search on RDF data
    VLDB J. (IF 1.973) Pub Date : 2020-03-12
    Zhi Cai, Georgios Kalamatianos, Georgios J. Fakas, Nikos Mamoulis, Dimitris Papadias

    Abstract The abundance and ubiquity of RDF data (such as DBpedia and YAGO2) necessitate their effective and efficient retrieval. For this purpose, keyword search paradigms liberate users from understanding the RDF schema and the SPARQL query language. Popular RDF knowledge bases (e.g., YAGO2) also include spatial semantics that enable location-based search. In an earlier location-based keyword search

    更新日期:2020-03-12
  • Context-aware, preference-based vehicle routing
    VLDB J. (IF 1.973) Pub Date : 2020-03-11
    Chenjuan Guo, Bin Yang, Jilin Hu, Christian S. Jensen, Lu Chen

    Abstract Vehicle routing is an important service that is used by both private individuals and commercial enterprises. Drivers may have different contexts that are characterized by different routing preferences. For example, during different times of day or weather conditions, drivers may make different routing decisions such as preferring or avoiding highways. The increasing availability of vehicle

    更新日期:2020-03-12
  • TurboLift: fast accuracy lifting for historical data recovery
    VLDB J. (IF 1.973) Pub Date : 2020-03-09
    Fan Yang, Faisal M. Almutairi, Hyun Ah Song, Christos Faloutsos, Nicholas D. Sidiropoulos, Vladimir Zadorozhny

    Abstract Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However

    更新日期:2020-03-10
  • Top- k term publish/subscribe for geo-textual data streams
    VLDB J. (IF 1.973) Pub Date : 2020-03-09
    Lisi Chen, Shuo Shang, Christian S. Jensen, Jianliang Xu, Panos Kalnis, Bin Yao, Ling Shao

    Abstract Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two

    更新日期:2020-03-09
  • Efficient ( $$\alpha $$α , $$\beta $$β )-core computation in bipartite graphs
    VLDB J. (IF 1.973) Pub Date : 2020-03-04
    Boge Liu, Long Yuan, Xuemin Lin, Lu Qin, Wenjie Zhang, Jingren Zhou

    Abstract The problem of computing (\(\alpha , \beta \))-core in a bipartite graph for given \(\alpha \) and \(\beta \) is a fundamental problem in bipartite graph analysis and can be used in many applications such as online group recommendation and fraudsters detection Existing solution to computing (\(\alpha , \beta \))-core needs to traverse the entire bipartite graph once and ignore the fact that

    更新日期:2020-03-04
  • Architecture of a distributed storage that combines file system, memory and computation in a single layer
    VLDB J. (IF 1.973) Pub Date : 2020-02-26
    Jia Zou, Arun Iyengar, Chris Jermaine

    Abstract Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non-shared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper, we propose a

    更新日期:2020-02-26
  • Finding k -shortest paths with limited overlap
    VLDB J. (IF 1.973) Pub Date : 2020-02-21
    Theodoros Chondrogiannis, Panagiotis Bouros, Johann Gamper, Ulf Leser, David B. Blumenthal

    Abstract In this paper, we investigate the computation of alternative paths between two locations in a road network. More specifically, we study the k-shortest paths with limited overlap (\(k\text {SPwLO}\)) problem that aims at finding a set of k paths such that all paths are sufficiently dissimilar to each other and as short as possible. To compute \(k\text {SPwLO}\) queries, we propose two exact

    更新日期:2020-02-23
  • Efficient maximum clique computation and enumeration over large sparse graphs
    VLDB J. (IF 1.973) Pub Date : 2020-02-15
    Lijun Chang

    Abstract This paper studies the problem of maximum clique computation (MCC) over sparse graphs, as large real-world graphs are usually sparse. In the literature, the problem of MCC over sparse graphs has been studied separately and less extensively than its dense counterpart—MCC over dense graphs—and advanced algorithmic techniques that are developed for MCC over dense graphs have not been utilized

    更新日期:2020-02-18
  • FERRARI: an efficient framework for visual exploratory subgraph search in graph databases
    VLDB J. (IF 1.973) Pub Date : 2020-01-30
    Chaohui Wang, Miao Xie, Sourav S. Bhowmick, Byron Choi, Xiaokui Xiao, Shuigeng Zhou

    Abstract Exploratory search paradigm assists users who do not have a clear search intent and are unfamiliar with the underlying data space. Query formulation evolves iteratively in this paradigm as a user becomes more familiar with the content. Although exploratory search has received significant attention recently in the context of structured data, scant attention has been paid for graph-structured

    更新日期:2020-01-31
  • LSM-based storage techniques: a survey
    VLDB J. (IF 1.973) Pub Date : 2019-07-19
    Chen Luo, Michael J. Carey

    Abstract Recently, the log-structured merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this paper, we provide a survey of recent research efforts on LSM-trees

    更新日期:2020-01-31
  • In-memory database acceleration on FPGAs: a survey
    VLDB J. (IF 1.973) Pub Date : 2019-10-26
    Jian Fang, Yvo T. B. Mulder, Jan Hidders, Jinho Lee, H. Peter Hofstee

    Abstract While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and are

    更新日期:2020-01-31
  • A survey of trajectory distance measures and performance evaluation
    VLDB J. (IF 1.973) Pub Date : 2019-10-18
    Han Su, Shuncheng Liu, Bolong Zheng, Xiaofang Zhou, Kai Zheng

    Abstract The proliferation of trajectory data in various application domains has inspired tremendous research efforts to analyze large-scale trajectory data from a variety of aspects. A fundamental ingredient of these trajectory analysis tasks and applications is distance measures for effectively determining how similar two trajectories are. We conduct a comprehensive survey of the trajectory distance

    更新日期:2020-01-31
  • Comparing heuristics for graph edit distance computation
    VLDB J. (IF 1.973) Pub Date : 2019-07-15
    David B. Blumenthal, Nicolas Boria, Johann Gamper, Sébastien Bougleux, Luc Brun

    Abstract Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error correction, local search, and linear programming

    更新日期:2020-01-31
  • Explaining Natural Language query results
    VLDB J. (IF 1.973) Pub Date : 2019-11-02
    Daniel Deutch, Nave Frost, Amir Gilad

    Abstract Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for

    更新日期:2020-01-31
  • A survey of community search over big graphs
    VLDB J. (IF 1.973) Pub Date : 2019-07-20
    Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, Xuemin Lin

    Abstract With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization

    更新日期:2020-01-31
  • Spatial crowdsourcing: a survey
    VLDB J. (IF 1.973) Pub Date : 2019-08-29
    Yongxin Tong, Zimu Zhou, Yuxiang Zeng, Lei Chen, Cyrus Shahabi

    Abstract Crowdsourcing is a computing paradigm where humans are actively involved in a computing task, especially for tasks that are intrinsically easier for humans than for computers. Spatial crowdsourcing is an increasing popular category of crowdsourcing in the era of mobile Internet and sharing economy, where tasks are spatiotemporal and must be completed at a specific location and time. In fact

    更新日期:2020-01-31
  • An experimental survey of regret minimization query and variants: bridging the best worlds between top- k query and skyline query
    VLDB J. (IF 1.973) Pub Date : 2019-09-14
    Min Xie, Raymond Chi-Wing Wong, Ashwin Lall

    Abstract When faced with a database containing millions of tuples, a user may be only interested in a (typically much) smaller representative subset. Recently, a query called the regret minimization query was proposed toward this purpose to create such a subset for users. Specifically, this query finds a set of tuples that minimizes the user regret (measured by how far the user’s favorite tuple in

    更新日期:2020-01-31
  • EntropyDB: a probabilistic approach to approximate query processing
    VLDB J. (IF 1.973) Pub Date : 2019-11-02
    Laurel Orr, Magdalena Balazinska, Dan Suciu

    Abstract We present, an interactive data exploration system that uses a probabilistic approach to generate a small, query-able summary of a dataset. Departing from traditional summarization techniques, we use the Principle of Maximum Entropy to generate a probabilistic representation of the data that can be used to give approximate query answers. We develop the theoretical framework and formulation

    更新日期:2020-01-31
  • Event modeling and mining: a long journey toward explainable events
    VLDB J. (IF 1.973) Pub Date : 2019-07-01
    Xinhong Chen, Qing Li

    Abstract Recently, research on event management has redrawn much attention and made great progress. As the core tasks of event management, event modeling and mining are essential for accessing and utilizing events effectively. In this survey, we provide a detailed review of event modeling and event mining. Based on a general definition, different characteristics of events are described, along with

    更新日期:2020-01-31
  • Indexing in flash storage devices: a survey on challenges, current approaches, and future trends
    VLDB J. (IF 1.973) Pub Date : 2019-08-03
    Athanasios Fevgas, Leonidas Akritidis, Panayiotis Bozanis, Yannis Manolopoulos

    Abstract Indexes are special purpose data structures, designed to facilitate and speed up the access to the contents of a file. Indexing has been actively and extensively investigated in DBMSes equipped with hard disk drives (HDDs). In the recent years, solid-state drives (SSDs), based on NAND flash technology, started replacing magnetic disks due to their appealing characteristics: high throughput/low

    更新日期:2020-01-31
  • SKCompress: compressing sparse and nonuniform gradient in distributed machine learning
    VLDB J. (IF 1.973) Pub Date : 2020-01-01
    Jiawei Jiang, Fangcheng Fu, Tong Yang, Yingxia Shao, Bin Cui

    Abstract Distributed machine learning (ML) has been extensively studied to meet the explosive growth of training data. A wide range of machine learning models are trained by a family of first-order optimization algorithms, i.e., stochastic gradient descent (SGD). The core operation of SGD is the calculation of gradients. When executing SGD in a distributed environment, the workers need to exchange

    更新日期:2020-01-06
  • $$\varvec{\textsc {Orpheus}}$$ORPHEUS DB: bolt-on versioning for relational databases (extended version)
    VLDB J. (IF 1.973) Pub Date : 2019-12-20
    Silu Huang, Liqi Xu, Jialin Liu, Aaron J. Elmore, Aditya Parameswaran

    Abstract Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce OrpheusDB, a dataset version control system that “bolts on” versioning capabilities

    更新日期:2020-01-06
  • Efficient query autocompletion with edit distance-based error tolerance
    VLDB J. (IF 1.973) Pub Date : 2019-12-14
    Jianbin Qin, Chuan Xiao, Sheng Hu, Jie Zhang, Wei Wang, Yoshiharu Ishikawa, Koji Tsuda, Kunihiko Sadakane

    Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, we study the problem of query autocompletion that tolerates errors in users’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distances from the query string are within the

    更新日期:2020-01-06
  • Efficient community discovery with user engagement and similarity
    VLDB J. (IF 1.973) Pub Date : 2019-10-26
    Fan Zhang, Xuemin Lin, Ying Zhang, Lu Qin, Wenjie Zhang

    Abstract In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we

    更新日期:2020-01-06
  • Efficient distributed reachability querying of massive temporal graphs
    VLDB J. (IF 1.973) Pub Date : 2019-09-28
    Tianming Zhang, Yunjun Gao, Lu Chen, Wei Guo, Shiliang Pu, Baihua Zheng, Christian S. Jensen

    Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world

    更新日期:2020-01-06
  • Skyline queries over incomplete data streams
    VLDB J. (IF 1.973) Pub Date : 2019-10-17
    Weilong Ren, Xiang Lian, Kambiz Ghazinour

    Abstract Nowadays, efficient and effective processing over massive stream data has attracted much attention from the database community, which are useful in many real applications such as sensor data monitoring, network intrusion detection, and so on. In practice, due to the malfunction of sensing devices or imperfect data collection techniques, real-world stream data may often contain missing or incomplete

    更新日期:2020-01-06
  • One-pass trajectory simplification using the synchronous Euclidean distance
    VLDB J. (IF 1.973) Pub Date : 2019-10-04
    Xuelian Lin, Jiahao Jiang, Shuai Ma, Yimeng Zuo, Chunming Hu

    Abstract Various mobile devices have been used to collect, store and transmit tremendous trajectory data, and it is known that raw trajectory data seriously wastes the storage, network bandwidth and computing resource. To attack this issue, one-pass line simplification (\(\textsf {LS} \)) algorithms have been developed, by compressing data points in a trajectory to a set of continuous line segments

    更新日期:2020-01-06
  • Coconut: sortable summarizations for scalable indexes over static and streaming data series
    VLDB J. (IF 1.973) Pub Date : 2019-09-25
    Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas

    Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing

    更新日期:2020-01-06
  • Parsing gigabytes of JSON per second
    VLDB J. (IF 1.973) Pub Date : 2019-10-11
    Geoff Langdale, Daniel Lemire

    Abstract JavaScript Object Notation or JSON is a ubiquitous data exchange format on the web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible. Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser

    更新日期:2020-01-06
  • Parallelizing approximate single-source personalized PageRank queries on shared memory
    VLDB J. (IF 1.973) Pub Date : 2019-10-08
    Runhui Wang, Sibo Wang, Xiaofang Zhou

    Abstract Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) \(\pi (s,t)\) measures the importance of node t with respect to node s. In this work, we study the single-source PPR query, which takes a source node s as input and outputs the PPR values of all nodes in G with respect to s. The single-source PPR query finds many important applications, e.g., community

    更新日期:2020-01-06
  • General dynamic Yannakakis: conjunctive queries with theta joins under updates
    VLDB J. (IF 1.973) Pub Date : 2019-11-19
    Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, Wolfgang Lehner

    Abstract The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. In prior work, we have proposed general dynamic Yannakakis (GDyn), a general framework for dynamically processing acyclic conjunctive queries with \(\theta \)-joins in the presence of data updates. Whereas traditional approaches face a trade-off between materialization of subresults

    更新日期:2020-01-06
  • Top- k relevant semantic place retrieval on spatiotemporal RDF data
    VLDB J. (IF 1.973) Pub Date : 2019-11-19
    Dingming Wu, Hao Zhou, Jieming Shi, Nikos Mamoulis

    Abstract RDF data are traditionally accessed using structured query languages, such as SPARQL. However, this requires users to understand the language as well as the RDF schema. Keyword search on RDF data aims at relieving users from these requirements; users only input a set of keywords, and the goal is to find small RDF subgraphs that contain all keywords. At the same time, popular RDF knowledge

    更新日期:2020-01-06
  • Making data visualization more efficient and effective: a survey
    VLDB J. (IF 1.973) Pub Date : 2019-11-19
    Xuedi Qin, Yuyu Luo, Nan Tang, Guoliang Li

    Data visualization is crucial in today’s data-driven business world, which has been widely used for helping decision making that is closely related to major revenues of many industrial companies. However, due to the high demand of data processing w.r.t. the volume, velocity, and veracity of data, there is an emerging need for database experts to help for efficient and effective data visualization.

    更新日期:2020-01-06
  • Adaptive partitioning and indexing for in situ query processing
    VLDB J. (IF 1.973) Pub Date : 2019-11-15
    Matthaios Olma, Manos Karpathiotakis, Ioannis Alagiannis, Manos Athanassoulis, Anastasia Ailamaki

    Abstract The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. To alleviate the loading cost, in situ query processing systems operate directly over raw data and offer instant access to data. At the same time, analytical workloads have

    更新日期:2020-01-06
  • Evaluating interactive data systems
    VLDB J. (IF 1.973) Pub Date : 2019-11-13
    Protiva Rahman, Lilong Jiang, Arnab Nandi

    Abstract Interactive query interfaces have become a popular tool for ad hoc data analysis and exploration. Compared with traditional systems that are optimized for throughput or batched performance, these systems focus more on user-centric interactivity. This poses a new class of performance challenges to the backend, which are further exacerbated by the advent of new interaction modes (e.g., touch

    更新日期:2020-01-06
  • Correction: A survey of community search over big graphs
    VLDB J. (IF 1.973) Pub Date : 2019-11-11
    Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, Xuemin Lin

    In the original article, the Table 1 was published with incorrect figures. The correct Table 1 is given below

    更新日期:2020-01-06
  • Cleaning data with Llunatic
    VLDB J. (IF 1.973) Pub Date : 2019-11-08
    Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro

    Abstract Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific

    更新日期:2020-01-06
  • The core decomposition of networks: theory, algorithms and applications
    VLDB J. (IF 1.973) Pub Date : 2019-11-04
    Fragkiskos D. Malliaros, Christos Giatsidis, Apostolos N. Papadopoulos, Michalis Vazirgiannis

    Abstract The core decomposition of networks has attracted significant attention due to its numerous applications in real-life problems. Simply stated, the core decomposition of a network (graph) assigns to each graph node v, an integer number c(v) (the core number), capturing how well v is connected with respect to its neighbors. This concept is strongly related to the concept of graph degeneracy,

    更新日期:2020-01-06
  • Adding data provenance support to Apache Spark.
    VLDB J. (IF 1.973) Pub Date : 2019-04-23
    Matteo Interlandi,Ari Ekmekji,Kshitij Shah,Muhammad Ali Gulzar,Sai Deep Tetali,Miryung Kim,Todd Millstein,Tyson Condie

    Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g., from log files) and performing trial-and-error debugging. To aid this effort, we built Titian, a library that enables data provenance-tracking

    更新日期:2019-11-01
  • On Differentially Private Frequent Itemset Mining.
    VLDB J. (IF 1.973) Pub Date : 2013-09-17
    Chen Zeng,Jeffrey F Naughton,Jin-Yi Cai

    We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items). Accordingly

    更新日期:2019-11-01
  • Performance analysis of a dual-tree algorithm for computing spatial distance histograms.
    VLDB J. (IF 1.973) Pub Date : 2011-08-02
    Shaoping Chen,Yi-Cheng Tu,Yuni Xia

    Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery

    更新日期:2019-11-01
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
向世界展示您的会议墙报和演示文稿
全球疫情及响应:BMC Medicine专题征稿
新版X-MOL期刊搜索和高级搜索功能介绍
化学材料学全球高引用
ACS材料视界
x-mol收录
自然科研论文编辑服务
南方科技大学
南方科技大学
西湖大学
中国科学院长春应化所于聪-4-8
复旦大学
课题组网站
X-MOL
深圳大学二维材料实验室张晗
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug