当前期刊: The VLDB Journal Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • GeoSparkViz : a cluster computing system for visualizing massive-scale geospatial data
    VLDB J. (IF 2.904) Pub Date : 2021-01-07
    Jia Yu, Mohamed Sarwat

    In the last decade, geospatial data which is extracted from GPS traces and satellites image has become ubiquitous. GeoVisual analytics, abbr. GeoViz, is the science of analytical reasoning assisted by geospatial map interfaces. GeoViz involves two phases: (1) spatial data processing: that loads spatial data and executes spatial queries to return the set of spatial objects to be visualized. (2) Map

  • DIFF: a relational interface for large-scale data explanation
    VLDB J. (IF 2.904) Pub Date : 2020-09-30
    Firas Abuzaid, Peter Kraft, Sahaana Suri, Edward Gan, Eric Xu, Atul Shenoy, Asvin Ananthanarayan, John Sheu, Erik Meijer, Xi Wu, Jeff Naughton, Peter Bailis, Matei Zaharia

    A range of explanation engines assist data analysts by performing feature selection over increasingly high-volume and high-dimensional data, grouping and highlighting commonalities among data points. While useful in diverse tasks such as user behavior analytics, operational event processing, and root-cause analysis, today’s explanation engines are designed as stand-alone data processing tools that

  • EI-LSH: An early-termination driven I/O efficient incremental c -approximate nearest neighbor search
    VLDB J. (IF 2.904) Pub Date : 2020-09-30
    Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, Lu Qin, Xuemin Lin

    Nearest neighbor in high-dimensional space has been widely used in various fields such as databases, data mining and machine learning. The problem has been well solved in low-dimensional space. However, when it comes to high-dimensional space, due to the curse of dimensionality, the problem is challenging. As a trade-off between accuracy and efficiency, c-approximate nearest neighbor (c-ANN) is considered

  • Building blocks for persistent memory
    VLDB J. (IF 2.904) Pub Date : 2020-09-23
    Alexander van Renen, Lukas Vogel, Viktor Leis, Thomas Neumann, Alfons Kemper

    I/O latency and throughput are two of the major performance bottlenecks for disk-based database systems. Persistent memory (PMem) technologies, like Intel’s Optane DC persistent memory modules, promise to bridge the gap between NAND-based flash (SSD) and DRAM, and thus eliminate the I/O bottleneck. In this paper, we provide the first comprehensive performance evaluation of PMem on real hardware in

  • Gossip-based visibility control for high-performance geo-distributed transactions
    VLDB J. (IF 2.904) Pub Date : 2020-09-21
    Hua Fan, Wojciech Golab

    Providing ACID transactions under conflicts across globally distributed data is the Everest of transaction processing protocols. Transaction processing in this scenario is particularly costly due to the high latency of cross-continent network links, which inflates concurrency control and data replication overheads. To mitigate the problem, we introduce Ocean Vista—a novel distributed protocol that

  • Crowdsourced top- k queries by pairwise preference judgments with confidence and budget control
    VLDB J. (IF 2.904) Pub Date : 2020-09-21
    Yan Li, Hao Wang, Ngai Meng Kou, Leong Hou U, Zhiguo Gong

    Crowdsourced query processing is an emerging technique that tackles computationally challenging problems by human intelligence. The basic idea is to decompose a computationally challenging problem into a set of human-friendly microtasks (e.g., pairwise comparisons) that are distributed to and answered by the crowd. The solution of the problem is then computed (e.g., by aggregation) based on the crowdsourced

  • TADOC: Text analytics directly on compression
    VLDB J. (IF 2.904) Pub Date : 2020-09-19
    Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, Xiaoyong Du

    This article provides a comprehensive description of text analytics directly on compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its effective realizations. Additionally, a series of guidelines and technical solutions that effectively address those challenges, including the adoption of a hierarchical

  • Autoscaling tiered cloud storage in Anna
    VLDB J. (IF 2.904) Pub Date : 2020-09-09
    Chenggang Wu, Vikram Sreekanti, Joseph M. Hellerstein

    In this paper, we describe how we extended a distributed key-value store called Anna into an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to overcome the narrow cost–performance limitations typical of current cloud storage systems. We describe three key aspects of Anna’s new design: multi-master selective replication of hot keys, a vertical tiering of storage

  • Querying subjective data
    VLDB J. (IF 2.904) Pub Date : 2020-09-08
    Yuliang Li, Aaron Feng, Jinfeng Li, Shuwei Chen, Saran Mumick, Alon Halevy, Vivian Li, Wang-Chiew Tan

    Online users are constantly seeking experiences, such as a hotel with clean rooms and a lively bar, or a restaurant for a romantic rendezvous. However, e-commerce search engines only support queries involving objective attributes such as location, price, and cuisine, and any experiential data is relegated to text reviews. In order to support experiential queries, a database system needs to model subjective

  • Continuous top- k spatial–keyword search on dynamic objects
    VLDB J. (IF 2.904) Pub Date : 2020-09-05
    Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jeffrey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, Hiroyuki Kitagawa

    As the popularity of SNS- and GPS-equipped mobile devices rapidly grows, numerous location-based applications have emerged. A common scenario is that a large number of users change location and interests from time to time; e.g., a user watches news, blogs, and videos while moving outside. Many online services have been developed based on continuously querying spatial–keyword objects. For instance,

  • Interactive checks for coordination avoidance
    VLDB J. (IF 2.904) Pub Date : 2020-09-05
    Michael Whittaker, Joseph M. Hellerstein

    Strongly consistent distributed systems are easy to reason about but face fundamental limitations in availability and performance. Weakly consistent systems can be implemented with very high performance but place a burden on the application developer to reason about complex interleavings of execution. Invariant confluence provides a formal framework for understanding when we can get the best of both

  • Cohort analytics: efficiency and applicability
    VLDB J. (IF 2.904) Pub Date : 2020-08-27
    Behrooz Omidvar-Tehrani, Sihem Amer-Yahia, Laks V. S. Lakshmanan

    The abundant availability of health-care data calls for effective analysis methods to help medical experts gain a better understanding of their patients and their health. The focus of existing work has been largely on prediction. In this paper, we introduce Core, a framework for cohort “representation” and “exploration.” Our contributions are twofold: First, we formalize cohort representation as the

  • Temporal locality-aware sampling for accurate triangle counting in real graph streams
    VLDB J. (IF 2.904) Pub Date : 2020-08-12
    Dongjin Lee, Kijung Shin, Christos Faloutsos

    If we cannot store all edges in a dynamic graph, which edges should we store to estimate the triangle count accurately? Counting triangles (i.e., cliques of size three) is a fundamental graph problem with many applications in social network analysis, web mining, anomaly detection, etc. Recently, much effort has been made to accurately estimate the counts of global triangles (i.e., all triangles) and

  • Incremental preference adjustment: a graph-theoretical approach
    VLDB J. (IF 2.904) Pub Date : 2020-08-03
    Liangjun Song, Junhao Gan, Zhifeng Bao, Boyu Ruan, H. V. Jagadish, Timos Sellis

    Learning users’ preferences is critical to personalized search and recommendation. Most such systems depend on lists of items rank-ordered according to the user’s preference. Ideally, we want the system to adjust its estimate of users’ preferences after every interaction, thereby becoming progressively better at giving the user what she wants. We also want these adjustments to be gradual and explainable

  • Faster & strong: string dictionary compression using sampling and fast vectorized decompression
    VLDB J. (IF 2.904) Pub Date : 2020-07-20
    Robert Lasch, Ismail Oukid, Roman Dementiev, Norman May, Suleyman S. Demirsoy, Kai-Uwe Sattler

    String dictionaries constitute a large portion of the memory footprint of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding (PFC) are favored in practice. This paper endeavors to make strong string dictionary compression practical. We focus on Re-Pair Front

  • VIP: A SIMD vectorized analytical query engine
    VLDB J. (IF 2.904) Pub Date : 2020-07-13
    Orestis Polychroniou, Kenneth A. Ross

    Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores

  • Scalable data series subsequence matching with ULISSE
    VLDB J. (IF 2.904) Pub Date : 2020-07-04
    Michele Linardi, Themis Palpanas

    Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data

  • Automatic weighted matching rectifying rule discovery for data repairing
    VLDB J. (IF 2.904) Pub Date : 2020-06-09
    Hiba Abu Ahmad, Hongzhi Wang

    Data repairing is a key problem in data cleaning which aims to uncover and rectify data errors. Traditional methods depend on data dependencies to check the existence of errors in data, but they fail to rectify the errors. To overcome this limitation, recent methods define repairing rules on which they depend to detect and fix errors. However, all existing data repairing rules are provided by experts

  • Finding skyline communities in multi-valued networks
    VLDB J. (IF 2.904) Pub Date : 2020-06-08
    Rong-Hua Li, Lu Qin, Fanghua Ye, Guoren Wang, Jeffrey Xu Yu, Xiaokui Xiao, Nong Xiao, Zibin Zheng

    Given a scientific collaboration network, how can we find a group of collaborators with high research indicator (e.g., h-index) and diverse research interests? Given a social network, how can we identify the communities that have high influence (e.g., PageRank) and also have similar interests to a specified user? In such settings, the network can be modeled as a multi-valued network where each node

  • Efficient approximation algorithms for adaptive influence maximization
    VLDB J. (IF 2.904) Pub Date : 2020-06-01
    Keke Huang, Jing Tang, Kai Han, Xiaokui Xiao, Wei Chen, Aixin Sun, Xueyan Tang, Andrew Lim

    Given a social network G and an integer k, the influence maximization (IM) problem asks for a seed set S of k nodes from G to maximize the expected number of nodes influenced via a propagation model. The majority of the existing algorithms for the IM problem are developed only under the non-adaptive setting, i.e., where all k seed nodes are selected in one batch without observing how they influence

  • Time series indexing by dynamic covering with cross-range constraints
    VLDB J. (IF 2.904) Pub Date : 2020-05-28
    Tao Sun, Hongbo Liu, Seán McLoone, Shaoxiong Ji, Xindong Wu

    Time series indexing plays an important role in querying and pattern mining of big data. This paper proposes a novel structure for tightly covering a given set of time series under the dynamic time warping similarity measurement. The structure, referred to as dynamic covering with cross-range constraints (DCRC), enables more efficient and scalable indexing to be developed than current hypercube-based

  • BAD to the bone: Big Active Data at its core
    VLDB J. (IF 2.904) Pub Date : 2020-05-23
    Steven Jacobs, Xikui Wang, Michael J. Carey, Vassilis J. Tsotras, Md Yusuf Sarwar Uddin

    Virtually, all of today’s Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting

  • A game-based framework for crowdsourced data labeling
    VLDB J. (IF 2.904) Pub Date : 2020-05-19
    Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du

    Data labeling, which assigns data with multiple classes, is indispensable for many applications, such as machine learning and data integration. However, existing labeling solutions either incur expensive cost for large datasets or produce noisy results. This paper introduces a cost-effective labeling approach and focuses on the labeling rule generation problem that aims to generate high-quality rules

  • RHEEMix in the data jungle: a cost-based optimizer for cross-platform systems
    VLDB J. (IF 2.904) Pub Date : 2020-05-18
    Sebastian Kruse, Zoi Kaoudi, Bertty Contreras-Rojas, Sanjay Chawla, Felix Naumann, Jorge-Arnulfo Quiané-Ruiz

    Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution

  • RDF graph summarization for first-sight structure discovery
    VLDB J. (IF 2.904) Pub Date : 2020-04-30
    François Goasdoué; Paweł Guzewicz; Ioana Manolescu

    To help users get familiar with large RDF graphs, RDF summarization techniques can be used. In this work, we study quotient summaries of RDF graphs, that is: graph summaries derived from a notion of equivalence among RDF graph nodes. We make the following contributions: (i) four novel summaries which are often small and easy-to-comprehend, in the style of E–R diagrams; (ii) efficient (amortized linear-time)

  • Diversified spatial keyword search on RDF data
    VLDB J. (IF 2.904) Pub Date : 2020-03-12
    Zhi Cai; Georgios Kalamatianos; Georgios J. Fakas; Nikos Mamoulis; Dimitris Papadias

    The abundance and ubiquity of RDF data (such as DBpedia and YAGO2) necessitate their effective and efficient retrieval. For this purpose, keyword search paradigms liberate users from understanding the RDF schema and the SPARQL query language. Popular RDF knowledge bases (e.g., YAGO2) also include spatial semantics that enable location-based search. In an earlier location-based keyword search paradigm

  • Context-aware, preference-based vehicle routing
    VLDB J. (IF 2.904) Pub Date : 2020-03-11
    Chenjuan Guo; Bin Yang; Jilin Hu; Christian S. Jensen; Lu Chen

    Vehicle routing is an important service that is used by both private individuals and commercial enterprises. Drivers may have different contexts that are characterized by different routing preferences. For example, during different times of day or weather conditions, drivers may make different routing decisions such as preferring or avoiding highways. The increasing availability of vehicle trajectory

  • TurboLift: fast accuracy lifting for historical data recovery
    VLDB J. (IF 2.904) Pub Date : 2020-03-09
    Fan Yang; Faisal M. Almutairi; Hyun Ah Song; Christos Faloutsos; Nicholas D. Sidiropoulos; Vladimir Zadorozhny

    Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data

  • Top- k term publish/subscribe for geo-textual data streams
    VLDB J. (IF 2.904) Pub Date : 2020-03-09
    Lisi Chen; Shuo Shang; Christian S. Jensen; Jianliang Xu; Panos Kalnis; Bin Yao; Ling Shao

    Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two types of

  • Efficient ( $$\alpha $$ α , $$\beta $$ β )-core computation in bipartite graphs
    VLDB J. (IF 2.904) Pub Date : 2020-03-04
    Boge Liu; Long Yuan; Xuemin Lin; Lu Qin; Wenjie Zhang; Jingren Zhou

    The problem of computing (\(\alpha , \beta \))-core in a bipartite graph for given \(\alpha \) and \(\beta \) is a fundamental problem in bipartite graph analysis and can be used in many applications such as online group recommendation and fraudsters detection Existing solution to computing (\(\alpha , \beta \))-core needs to traverse the entire bipartite graph once and ignore the fact that real-world

  • Architecture of a distributed storage that combines file system, memory and computation in a single layer
    VLDB J. (IF 2.904) Pub Date : 2020-02-26
    Jia Zou; Arun Iyengar; Chris Jermaine

    Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non-shared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper, we propose a single system

  • Finding k -shortest paths with limited overlap
    VLDB J. (IF 2.904) Pub Date : 2020-02-21
    Theodoros Chondrogiannis; Panagiotis Bouros; Johann Gamper; Ulf Leser; David B. Blumenthal

    In this paper, we investigate the computation of alternative paths between two locations in a road network. More specifically, we study the k-shortest paths with limited overlap (\(k\text {SPwLO}\)) problem that aims at finding a set of k paths such that all paths are sufficiently dissimilar to each other and as short as possible. To compute \(k\text {SPwLO}\) queries, we propose two exact algorithms

  • Efficient maximum clique computation and enumeration over large sparse graphs
    VLDB J. (IF 2.904) Pub Date : 2020-02-15
    Lijun Chang

    This paper studies the problem of maximum clique computation (MCC) over sparse graphs, as large real-world graphs are usually sparse. In the literature, the problem of MCC over sparse graphs has been studied separately and less extensively than its dense counterpart—MCC over dense graphs—and advanced algorithmic techniques that are developed for MCC over dense graphs have not been utilized in the existing

  • FERRARI: an efficient framework for visual exploratory subgraph search in graph databases
    VLDB J. (IF 2.904) Pub Date : 2020-01-30
    Chaohui Wang; Miao Xie; Sourav S. Bhowmick; Byron Choi; Xiaokui Xiao; Shuigeng Zhou

    Exploratory search paradigm assists users who do not have a clear search intent and are unfamiliar with the underlying data space. Query formulation evolves iteratively in this paradigm as a user becomes more familiar with the content. Although exploratory search has received significant attention recently in the context of structured data, scant attention has been paid for graph-structured data. An

  • SKCompress: compressing sparse and nonuniform gradient in distributed machine learning
    VLDB J. (IF 2.904) Pub Date : 2020-01-01
    Jiawei Jiang; Fangcheng Fu; Tong Yang; Yingxia Shao; Bin Cui

    Distributed machine learning (ML) has been extensively studied to meet the explosive growth of training data. A wide range of machine learning models are trained by a family of first-order optimization algorithms, i.e., stochastic gradient descent (SGD). The core operation of SGD is the calculation of gradients. When executing SGD in a distributed environment, the workers need to exchange local gradients

  • $$\varvec{\textsc {Orpheus}}$$ORPHEUS DB: bolt-on versioning for relational databases (extended version)
    VLDB J. (IF 2.904) Pub Date : 2019-12-20
    Silu Huang; Liqi Xu; Jialin Liu; Aaron J. Elmore; Aditya Parameswaran

    Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce OrpheusDB, a dataset version control system that “bolts on” versioning capabilities to a

  • Efficient query autocompletion with edit distance-based error tolerance
    VLDB J. (IF 2.904) Pub Date : 2019-12-14
    Jianbin Qin; Chuan Xiao; Sheng Hu; Jie Zhang; Wei Wang; Yoshiharu Ishikawa; Koji Tsuda; Kunihiko Sadakane

    Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, we study the problem of query autocompletion that tolerates errors in users’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distances from the query string are within the

  • General dynamic Yannakakis: conjunctive queries with theta joins under updates
    VLDB J. (IF 2.904) Pub Date : 2019-11-19
    Muhammad Idris; Martín Ugarte; Stijn Vansummeren; Hannes Voigt; Wolfgang Lehner

    The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. In prior work, we have proposed general dynamic Yannakakis (GDyn), a general framework for dynamically processing acyclic conjunctive queries with \(\theta \)-joins in the presence of data updates. Whereas traditional approaches face a trade-off between materialization of subresults (to avoid

  • Top- k relevant semantic place retrieval on spatiotemporal RDF data
    VLDB J. (IF 2.904) Pub Date : 2019-11-19
    Dingming Wu; Hao Zhou; Jieming Shi; Nikos Mamoulis

    RDF data are traditionally accessed using structured query languages, such as SPARQL. However, this requires users to understand the language as well as the RDF schema. Keyword search on RDF data aims at relieving users from these requirements; users only input a set of keywords, and the goal is to find small RDF subgraphs that contain all keywords. At the same time, popular RDF knowledge bases also

  • Making data visualization more efficient and effective: a survey
    VLDB J. (IF 2.904) Pub Date : 2019-11-19
    Xuedi Qin; Yuyu Luo; Nan Tang; Guoliang Li

    Data visualization is crucial in today’s data-driven business world, which has been widely used for helping decision making that is closely related to major revenues of many industrial companies. However, due to the high demand of data processing w.r.t. the volume, velocity, and veracity of data, there is an emerging need for database experts to help for efficient and effective data visualization.

  • Adaptive partitioning and indexing for in situ query processing
    VLDB J. (IF 2.904) Pub Date : 2019-11-15
    Matthaios Olma; Manos Karpathiotakis; Ioannis Alagiannis; Manos Athanassoulis; Anastasia Ailamaki

    The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. To alleviate the loading cost, in situ query processing systems operate directly over raw data and offer instant access to data. At the same time, analytical workloads have increasing

  • Evaluating interactive data systems
    VLDB J. (IF 2.904) Pub Date : 2019-11-13
    Protiva Rahman; Lilong Jiang; Arnab Nandi

    Interactive query interfaces have become a popular tool for ad hoc data analysis and exploration. Compared with traditional systems that are optimized for throughput or batched performance, these systems focus more on user-centric interactivity. This poses a new class of performance challenges to the backend, which are further exacerbated by the advent of new interaction modes (e.g., touch, gesture)

  • Correction: A survey of community search over big graphs
    VLDB J. (IF 2.904) Pub Date : 2019-11-11
    Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, Xuemin Lin

    In the original article, the Table 1 was published with incorrect figures. The correct Table 1 is given below

  • Cleaning data with Llunatic
    VLDB J. (IF 2.904) Pub Date : 2019-11-08
    Floris Geerts; Giansalvatore Mecca; Paolo Papotti; Donatello Santoro

    Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes

  • The core decomposition of networks: theory, algorithms and applications
    VLDB J. (IF 2.904) Pub Date : 2019-11-04
    Fragkiskos D. Malliaros; Christos Giatsidis; Apostolos N. Papadopoulos; Michalis Vazirgiannis

    The core decomposition of networks has attracted significant attention due to its numerous applications in real-life problems. Simply stated, the core decomposition of a network (graph) assigns to each graph node v, an integer number c(v) (the core number), capturing how well v is connected with respect to its neighbors. This concept is strongly related to the concept of graph degeneracy, which has

  • Explaining Natural Language query results
    VLDB J. (IF 2.904) Pub Date : 2019-11-02
    Daniel Deutch; Nave Frost; Amir Gilad

    Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming

  • EntropyDB: a probabilistic approach to approximate query processing
    VLDB J. (IF 2.904) Pub Date : 2019-11-02
    Laurel Orr; Magdalena Balazinska; Dan Suciu

    We present, an interactive data exploration system that uses a probabilistic approach to generate a small, query-able summary of a dataset. Departing from traditional summarization techniques, we use the Principle of Maximum Entropy to generate a probabilistic representation of the data that can be used to give approximate query answers. We develop the theoretical framework and formulation of our probabilistic

  • Efficient processing of moving collective spatial keyword queries
    VLDB J. (IF 2.904) Pub Date : 2019-11-01
    Hongfei Xu; Yu Gu; Yu Sun; Jianzhong Qi; Ge Yu; Rui Zhang

    As a major type of continuous spatial queries, the moving spatial keyword queries have been studied extensively. Most existing studies focus on retrieving single objects, each of which is close to the query object and relevant to the query keywords. Nevertheless, a single object may not satisfy all the needs of a user, e.g., a user who is driving may want to withdraw money, wash her car, and buy some

  • Fast stochastic routing under time-varying uncertainty
    VLDB J. (IF 2.904) Pub Date : 2019-10-31
    Simon Aagaard Pedersen; Bin Yang; Christian S. Jensen

    Data are increasingly available that enable detailed capture of travel costs associated with the movements of vehicles in road networks, notably travel time, and greenhouse gas emissions. In addition to varying across time, such costs are inherently uncertain, due to varying traffic volumes, weather conditions, different driving styles among drivers, etc. In this setting, we address the problem of

  • In-memory database acceleration on FPGAs: a survey
    VLDB J. (IF 2.904) Pub Date : 2019-10-26
    Jian Fang; Yvo T. B. Mulder; Jan Hidders; Jinho Lee; H. Peter Hofstee

    While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and are easier to

  • Efficient community discovery with user engagement and similarity
    VLDB J. (IF 2.904) Pub Date : 2019-10-26
    Fan Zhang; Xuemin Lin; Ying Zhang; Lu Qin; Wenjie Zhang

    In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we consider

  • A survey of trajectory distance measures and performance evaluation
    VLDB J. (IF 2.904) Pub Date : 2019-10-18
    Han Su; Shuncheng Liu; Bolong Zheng; Xiaofang Zhou; Kai Zheng

    The proliferation of trajectory data in various application domains has inspired tremendous research efforts to analyze large-scale trajectory data from a variety of aspects. A fundamental ingredient of these trajectory analysis tasks and applications is distance measures for effectively determining how similar two trajectories are. We conduct a comprehensive survey of the trajectory distance measures

  • Skyline queries over incomplete data streams
    VLDB J. (IF 2.904) Pub Date : 2019-10-17
    Weilong Ren; Xiang Lian; Kambiz Ghazinour

    Nowadays, efficient and effective processing over massive stream data has attracted much attention from the database community, which are useful in many real applications such as sensor data monitoring, network intrusion detection, and so on. In practice, due to the malfunction of sensing devices or imperfect data collection techniques, real-world stream data may often contain missing or incomplete

  • Parsing gigabytes of JSON per second
    VLDB J. (IF 2.904) Pub Date : 2019-10-11
    Geoff Langdale; Daniel Lemire

    JavaScript Object Notation or JSON is a ubiquitous data exchange format on the web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible. Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser to process

  • Parallelizing approximate single-source personalized PageRank queries on shared memory
    VLDB J. (IF 2.904) Pub Date : 2019-10-08
    Runhui Wang; Sibo Wang; Xiaofang Zhou

    Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) \(\pi (s,t)\) measures the importance of node t with respect to node s. In this work, we study the single-source PPR query, which takes a source node s as input and outputs the PPR values of all nodes in G with respect to s. The single-source PPR query finds many important applications, e.g., community detection

  • One-pass trajectory simplification using the synchronous Euclidean distance
    VLDB J. (IF 2.904) Pub Date : 2019-10-04
    Xuelian Lin; Jiahao Jiang; Shuai Ma; Yimeng Zuo; Chunming Hu

    Various mobile devices have been used to collect, store and transmit tremendous trajectory data, and it is known that raw trajectory data seriously wastes the storage, network bandwidth and computing resource. To attack this issue, one-pass line simplification (\(\textsf {LS} \)) algorithms have been developed, by compressing data points in a trajectory to a set of continuous line segments. However

  • Efficient distributed reachability querying of massive temporal graphs
    VLDB J. (IF 2.904) Pub Date : 2019-09-28
    Tianming Zhang; Yunjun Gao; Lu Chen; Wei Guo; Shiliang Pu; Baihua Zheng; Christian S. Jensen

    Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world

  • Coconut: sortable summarizations for scalable indexes over static and streaming data series
    VLDB J. (IF 2.904) Pub Date : 2019-09-25
    Haridimos Kondylakis; Niv Dayan; Kostas Zoumpatianos; Themis Palpanas

    Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing

  • Microblogs data management: a survey
    VLDB J. (IF 2.904) Pub Date : 2019-09-18
    Amr Magdy; Laila Abdelhafeez; Yunfan Kang; Eric Ong; Mohamed F. Mokbel

    Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports

  • An experimental survey of regret minimization query and variants: bridging the best worlds between top- k query and skyline query
    VLDB J. (IF 2.904) Pub Date : 2019-09-14
    Min Xie; Raymond Chi-Wing Wong; Ashwin Lall

    When faced with a database containing millions of tuples, a user may be only interested in a (typically much) smaller representative subset. Recently, a query called the regret minimization query was proposed toward this purpose to create such a subset for users. Specifically, this query finds a set of tuples that minimizes the user regret (measured by how far the user’s favorite tuple in the selected

Contents have been reproduced by permission of the publishers.
Springer 纳米技术权威期刊征稿
ACS ES&T Engineering
ACS ES&T Water