-
Where do Databases and Digital Forensics meet? A Comprehensive Survey and Taxonomy ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Danilo B. Seufitelli, Michele A. Brandão, Ayane C. A. Fernandes, Kayque M. Siqueira, Mirella M. Moro
We present a systematic literature review and propose a taxonomy for research at the intersection of Digital Forensics and Databases. The merge between these two areas has become more prolific due to the growing volume of data and mobile apps on the Web, and the consequent rise in cyber attacks. Our review has identified 91 relevant papers. The taxonomy categorizes such papers into: Cyber-Attacks (subclasses
-
Query Evaluation under Differential Privacy ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Wei Dong, Ke Yi
Differential privacy has garnered significant attention in recent years due to its potential in offering robust privacy protection for individual data during analysis. With the increasing volume of sensitive information being collected by organizations and analyzed through SQL queries, the development of a general-purpose query engine that is capable of supporting a broad range of queries while maintaining
-
Apache Wayang: A Unified Data Analytics Framework ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Kaustubh Beedkar, Bertty Contreras-Rojas, Haralampos Gavriilidis, Zoi Kaoudi, Volker Markl, Rodrigo Pardo-Meza, Jorge-Arnulfo Quiané-Ruiz
The large variety of specialized data processing platforms and the increased complexity of data analytics has led to the need for unifying data analytics within a single framework. Such a framework should free users from the burden of (i) choosing the right platform( s) and (ii) gluing code between the different parts of their pipelines. Apache Wayang (Incubating) is the only open-source framework
-
Reminiscences on Influential Papers ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Renata Borovica-Gajic
This issue's contributors chose papers that address challenges at the heart of database systems: physical design tuning for index selection and transaction isolation levels. Both contributions emphasize the elegant, modular, and long-lasting design choices of the respective work. Enjoy reading!
-
Kùzu: A Database Management System For ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Semih Salihoglu
I would like to share my opinions on the following question: how should a modern graph DBMS (GDBMS) be architected? This is the motivating research question we are addressing in the K`uzu project at University of Waterloo [4, 5].1 I will argue that a modern GDBMS should optimize for a set of what I will call, for lack of a better term, "beyond relational" workloads. As a background, let me start with
-
Proactive Resource Allocation Policy for Microsoft Azure Cognitive Search ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Olga Poppe, Pablo Castro, Willis Lang, Jyoti Leeka
Modern cloud services aim to find the middle ground between quality of service and operational cost efficiency by allocating resources if and only if these resources are needed by the customers. Unfortunately, most industrial demand-driven resource allocation approaches are reactive. Given that scaling mechanisms are not instantaneous, the reactive policy may introduce delays to latency-sensitive customer
-
From Large Language Models to Databases and Back: A Discussion on Research and Education ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-11-02 Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang
In recent years, large language models (LLMs) have garnered increasing attention from both academia and industry due to their potential to facilitate natural language processing (NLP) and generate highquality text. Despite their benefits, however, the use of LLMs is raising concerns about the reliability of knowledge extraction. The combination of DB research and data science has advanced the state
-
The Shapley Value in Database Management ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Leopoldo Bertossi, Benny Kimelfeld, Ester Livshits, Mikaël Monet
Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention
-
Knowledge Graphs Querying ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Arijit Khan
Knowledge graphs (KGs) such as DBpedia, Freebase, YAGO, Wikidata, and NELL were constructed to store large-scale, real-world facts as (subject, predicate, object) triples - that can also be modeled as a graph, where a node (a subject or an object) represents an entity with attributes, and a directed edge (a predicate) is a relationship between two entities. Querying KGs is critical in web search, question
-
Reminiscences on Influential Papers ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Ashraf Aboulnaga
This issue's contributors highlight their influences when it comes to their research agenda on parallel data processing and skyline queries, respectively. Enjoy reading! While I will keep inviting members of the data management community, and neighboring communities, to contribute to this column, I also welcome unsolicited contributions. Please contact me if you are interested.
-
Mid-Career Academics: What I Have Learned or Wish I Had Known ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Qiong Luo
As I agreed on contributing a piece to this series of which Tamer is in charge, I read all the preceding articles to get inspiration. The impression I got was, "gosh, I wish I had known all this back in my mid-career days and even these days!" For example, I did place students on my collaborative projects, but would have made faster progress in some projects if I had insisted on having weekly meetings
-
Efficient Data Sharing across Trust Domains ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Natacha Crooks
Cross-Trust-Domain Processing. Data is now a commodity. We know how to compute and store it efficiently and reliably at scale. We have, however, paid less attention to the notion of trust. Yet, data owners today are no longer the entities storing or processing their data (medical records are stored on the cloud, data is shared across banks, etc.). In fact, distributed systems today consist of many
-
Diversity, Equity and Inclusion Activities in Database Conferences: A 2022 Report ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Sihem Amer-Yahia, Divyakant Agrawal, Yael Amsterdamer, Sourav S. Bhowmick, Angela Bonifati, Renata Borovica-Gajic, Jesús Camacho-Rodríguez, Barbara Catania, Panos K. Chrysanthis, Carlo Curino, Jérôme Darmont, Gillian Dobbie, Amr El Abbadi, Avrilia Floratou, Juliana Freire, Alekh Jindal, Vana Kalogeraki, Sujaya Maiyya, Alexandra Meliou, Madhulika Mohanty, Behrooz Omidvar-Tehrani, Fatma Özcan, Liat Peterfreund
The Diversity, Equity and Inclusion (DEI) initiative started as the Diversity/Inclusion initiative in 2020 [4]. The current report summarizes our activities in 2022. Our responsibility as a community is to ensure that attendees of DB conferences feel included, irrespective of their scientific perspective and personal background. One of the first steps was to establish the role of the DEI chairs at
-
Experiences and Lessons Learned from the SIGMOD Entity Resolution Programming Contests ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Andrea De Angelis, Maurizio Mazzei, Federico Piai, Paolo Merialdo, Giovanni Simonini, Luca Zecchini, Sonia Bergamaschi, Donatella Firmani, Xu Chu, Peng Li, Renzhi Wu
We report our experience in running three editions (2020, 2021, 2022) of the SIGMOD programming contest, a well-known event for students to engage in solving exciting data management problems. During this period we had the opportunity of introducing participants to the entity resolution task, which is of paramount importance in the data integration community. We aim at sharing the executive decisions
-
The Role of Data Scientists in Modern Enterprises - Experience from Data Science Education ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Thoralf Mildenberger, Martin Braschler, Andreas Ruckstuhl, Robert Vorburger, Kurt Stockinger
"Data Scientist" has often been considered as the sexiest job of the 21st century. As a consequence, the spectrum of data science education programs has increased significantly in recent years, and there is a high demand for data scientists at many companies. However, what training is required to become a data scientist? What is the role of data scientists in current enterprises? Is the training well-aligned
-
Report on the Workshop on Factorized Databases ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-08-11 Dan Olteanu
The workshop took place in Zurich and online from August 2 to 4, 2022. It was attended by researchers from 17 academic institutions and industry labs, including Microsoft Gray Systems Lab, Omics Data Automation, Oracle Labs Zurich, RelationalAI, and TigerGraph. It featured 18 talks and plenty of opportunities for discussions. The vast majority of participants attended in person.
-
TECHNICAL PERSPECTIVE: Ad Hoc Transactions: What They Are and Why We Should Care ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Kenneth Salem
Most database research papers are prescriptive. They identify a technical problem and show us how to solve it. They present new algorithms, theorems, and evaluations of prototypes. Other papers follow a different path: descriptive rather than prescriptive. They tell us how data systems behave in practice, and how they are actually used. They employ a different set of tools, such as surveys, software
-
Ad Hoc Transactions: What They Are and Why We Should Care ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Chuzhe Tang, Zhaoguo Wang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, Haibo Chen
Many transactions in web applications are constructed ad hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. We refer to database operations coordinated by application code as ad hoc transactions. Until now, little is known about them. This paper presents the first comprehensive study on ad hoc
-
Technical Perspective: Sortledton: a Universal Graph Data Structure ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Angela Bonifati
Graph processing is becoming ubiquitous due to the proliferation of interconnected data in several domains, including life sciences, social networks, cybersecurity, finance and logistics, to name a few. In parallel with the growth of the underlying graph data sources, a plethora of graph workloads have appeared, ranging from graph analytics to graph traversals and graph pattern matching. Graph systems
-
Sortledton: a Universal Graph Data Structure ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Per Fuchs, Domagoj Margan, Jana Giceva
Despite the wide adoption of graph processing across many different application domains, there is no underlying data structure that can serve a variety of graph workloads (analytics, traversals, and pattern matching) on dynamic graphs with single edge updates updates.
-
Technical Perspective for Skeena: Efficient and Consistent Cross-Engine Transactions ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Carsten Binnig
The paper proposes a solution to the problem of inadequate support for transactions in multi-engine database systems. Multi-engine database systems are databases that integrate new (fast) memory-optimized storage engines with (slow) traditional engines, allowing the application to use tables in both engines. Multi-engine database systems are in particular interesting for traditional database systems
-
Efficiently Making Cross-Engine Transactions Consistent ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Jianqiu Zhang, Kaisong Huang, Tianzheng Wang, King Lv
Database systems are becoming increasingly multi-engine. In particular, a main-memory engine may coexist with a traditional storage-centric engine in a system to support various applications. It is desirable to allow applications to access data in both engines using cross-engine transactions. But existing systems are either only designed for singleengine accesses, or impose many restrictions by limiting
-
Technical Perspective: When is it safe to run a transactional workload under Read Committed? ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Alan D. Fekete
A data management platform provides many capabilities to assist the data owner, application coder, or end-user. For example, it should support an expressive query language, schema definition, and sophisticated access control. Another way many platforms add value is through a transaction mechanism, which allows the application programmer to indicate that a stretch of code, including multiple accesses
-
When is it safe to run a transactional workload under Read Committed? ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Brecht Vandevoort, Bas Ketsman, Christoph Koch, Frank Neven
The popular isolation level multiversion Read Committed (RC) exchanges some of the strong guarantees of serializability for increased transaction throughput. Nevertheless, transaction workloads can sometimes be executed under RC while still guaranteeing serializability at a reduced cost. Such workloads are said to be robust against RC. This paper provides a high level overview of deciding robustness
-
Technical Perspective for Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Tim Kraska
Separation of compute and storage has become the defacto standard for cloud database systems. First proposed in 2007 for database systems [2], it is now widely adopted by all major cloud providers such as Amazon Redshift, Google BigQuery, and Snowflake. Separation of compute and storage adds enormous value for the customer. Users can scale storage independently of compute, which enables them to only
-
Building Write-Optimized Tree Indexes on Disaggregated Memory ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Qing Wang, Youyou Lu, Jiwu Shu
Memory disaggregation architecture physically separates CPU and memory into independent components, which are connected via high-speed RDMA networks, greatly improving resource utilization of database systems. However, such an architecture poses unique challenges to data indexing due to limited RDMA semantics and near-zero computation power at memory side. Existing indexes supporting disaggregated
-
Technical Perspective: Conjunctive Queries with Comparisons ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Stijn Vansummeren
Query processing, the art of efficiently executing a relational query on a given database, is a foundational and core area in data management research. Established at the dawn of relational database systems in the 1970's, relational query processing remains a highly relevant and vibrant research topic today as recent work shows that, apart from its application in traditional database scenarios, it
-
Conjunctive Queries with Comparisons ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Qichen Wang, Ke Yi
Conjunctive queries with predicates in the form of comparisons that span multiple relations have regained interest recently, due to their relevance in OLAP queries, spatiotemporal databases, and machine learning over relational data. The standard technique, predicate pushdown, has limited efficacy on such comparisons. A technique by Willard can be used to process short comparisons that are adjacent
-
Technical Perspective: Query Answers - Fewer is Faster ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Leonid Libkin
We often write queries using LIMIT k, indicating that only k answers are to be returned. This feature is present in most query languages, for different data models: SQL, SPARQL, Cypher etc. For example, in a repository of about 250M SPARQL queries, about 15M queries are of this form. Not surprisingly of course, the database research community studied such queries extensively. The dominant setting is
-
Threshold Queries ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Angela Bonifati, Stefania Dumbrava, George Fletcher, Jan Hidders, Matthias Hofer, Wim Martens, Filip Murlak, Joshua Shinavier, Slawek Staworko, Dominik Tomaszuk
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. We explore how such queries appear in practice and present a method that can be used to significantly
-
Technical Perspective: (Pre-) Semirings Come to the Recursion Party ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Atri Rudra
(This article is an imagined conversation with my U. at Buffalo UG algorithms class students.)
-
Convergence of Datalog over (Pre-) Semirings ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu, Yisu Remy Wang
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary semiring
-
Technical Perspective: Optimal Algorithms for Multiway Search on Partial Orders ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Rajesh Jayaram
Given a list of comparable items A = {a1, . . . , an sorted so that a1 < a2 < . . . < an, a canonical problem is locating a target item q within A if it exists. The canonical algorithm for this problem, of course, is binary search, which locates q using at most O(log n) comparisons between q and elements of A. Binary search is an indispensable tool for totally ordered datasets. However, many naturally
-
An Optimal Algorithm for Partial Order Multiway Search ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Shangqi Lu, Wim Martens, Matthias Niewerth, Yufei Tao
Partial order multiway search (POMS) is an important problem that finds use in crowdsourcing, distributed file systems, software testing, etc. In this problem, a game is played between an algorithm A and an oracle, based on a directed acyclic graph G known to both parties. First, the oracle picks a vertex t in G called the target; then, A aims to figure out which vertex is t by probing reachability
-
Technical Perspective: Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Dan Suciu
Query engines are really good at choosing an efficient query plan. Users don't need to worry about how they write their query, since the optimizer makes all the right choices for executing the query, while taking into account all aspects of data, such as its size, the characteristics of the storage device, the distribution pattern, the availability of indexes, and so on. The query optimizer always
-
Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Jeremy Chen, Yuqing Huang, Mushi Wang, Semih Salihoglu, Kenneth Salem
We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We show that optimistic
-
Technical Perspective: Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Andreas Kipf
Query optimization is the process of finding an efficient query execution plan for a given SQL query. The runtime difference between a good and a bad plan can be tremendous. For example, in the case of TPC-H query 5, a query with 5 joins, the difference between the best and the worst plan is more than 10,000×. Therefore, it is vital to avoid bad plans. The dominating factor which differentiates a good
-
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Christina Pavlopoulou, Michael J. Carey, Vassilis J. Tsotras
Effective query optimization remains an open problem for Big Data Management Systems. In this work, we revisit an old idea, runtime dynamic optimization, and adapt it to a big data management system, AsterixDB. The approach runs in stages (re-optimization points), starting by first executing all predicates local to a single dataset. The intermediate result created by a stage is then used to re-optimize
-
Technical Perspective on 'R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Graham Cormode
Increased use of data to inform decision making has brought with it a rising awareness of the importance of privacy, and the need for appropriate mitigations to be put in place to protect the interests of individuals whose data is being processed. From the demographic statistics that are produced by national censuses, to the complex predictive models built by "big tech" companies, data is the fuel
-
R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-06-08 Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, Ashwin Machanavajjhala
Answering SPJA queries under differential privacy (DP), including graph pattern counting under node-DP as an important special case, has received considerable attention in recent years. The dual challenge of foreign-key constraints and self-joins is particularly tricky to deal with, and no existing DP mechanisms can correctly handle both. For the special case of graph pattern counting under node-DP
-
Management of Machine Learning Lifecycle Artifacts: A Survey: ACM SIGMOD Record: Vol 51, No 4 ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Marius Schlegel, Kai-Uwe Sattler
The explorative and iterative nature of developing and operating ML applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection
-
Concurrency control for database theorists ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Bas Ketsman, Christoph Koch, Frank Neven, Brecht Vandevoort
The aim of this paper is to serve as a lightweight introduction to concurrency control for database theorists through a uniform presentation of the work on robustness against Multiversion Read Committed and Snapshot Isolation.
-
PDQ 2.0: Flexible Infrastructure for Integrating Reasoning and Query Planning: ACM SIGMOD Record: Vol 51, No 4 ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Michael Benedikt, Fergus Cooper, Stefano Germano, Gabor Gyorkei, Efthymia Tsamoura, Brandon Moore, Camilo Ortiz
Reasoning-based query planning has been explored in many contexts, including relational data integration, the SemanticWeb, and query reformulation. But infrastructure to build reasoning-based optimization in the relational context has been slow to develop. We overview PDQ 2.0, a platform supporting a number of reasoningenhanced querying tasks. We focus on a major goal of PDQ 2.0: obtaining a more modular
-
Reminiscences on Influential Papers ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Tilmann Rabl
When I started my PhD, I wanted to do something related to systems but I wasn't sure exactly what. I didn't consider data management systems initially, because I was unaware of the richness of the systems work that data management systems were build on. I thought the field was mainly about SQL. Luckily, that view changed quickly.
-
Mid-Career Researcher, huh?: What just Changed?: ACM SIGMOD Record: Vol 51, No 4 ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Laks V.S. Lakshmanan
You just got promoted to Associate Professor. Like most things in life, whether joys or sorrows, the joy of this accomplishment will not last forever. However, that doesn't mean that you should not look back and reflect on years of hard work and tenacity that you have put in which have earned you this promotion, so first of all, congratulations! Take a moment to savor this accomplishment. On the other
-
Report on the First International Workshop on Data Systems Education (DataEd '22) ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Efthimia Aivaloglou, George Fletcher, Michael Liut, Daphne Miedema
This report summarizes the outcomes of the first international workshop on Data Systems Education: Bridging Education Practice with Education Research (DataEd '22). The workshop was held in conjunction with the SIGMOD '22 conference in Philadelphia, USA on June 17, 2022. The aim of the workshop was to provide a dedicated venue for presenting and and discussing data management systems education experiences
-
Collaborative Data Science using Scalable Homoiconicity ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Holger Pirk
Motivation: Data science is increasingly collaborative. On the one hand, results need to be distributed, e.g., as interactive visualizations. On the other, collaboration in the data development process improves quality and timeliness. This can take many forms: partitioning a problem and working on aspects in parallel, exploring different solutions or reviewing someone else's work.
-
Chenggang Wu Speaks Out on his ACM SIGMOD Jim Gray Dissertation Award, Rejection, Believing in Your Work, and More ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Chenggang Wu
Welcome to this installment of ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today we are on Zoom with Chenggang Wu, co-founder and CTO of Aqueduct. Chenggang received the 2022 ACM SIGMOD Jim Gray Dissertation Award for his thesis entitled The Design of Any-scale Serverless Infrastructure with Rich Consistency Guarantees. His
-
The World of Graph Databases from An Industry Perspective ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Yuanyuan Tian
Rapidly growing social networks and other graph data have created a high demand for graph technologies in the market. A plethora of graph databases, systems, and solutions have emerged, as a result. On the other hand, graph has long been a well studied area in the database research community. Despite the numerous surveys on various graph research topics, there is a lack of survey on graph technologies
-
Enterprise Platform and Integration Concepts Research at HPI ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Michael Perscheid, Hasso Plattner, Daniel Ritter, Rainer Schlosser, Ralf Teusner
The Hasso Plattner Institute (HPI), academically structured as the independent Faculty of Digital Engineering at the University of Potsdam, unites computer science research and teaching with the advantages of a privately financed institute and a tuition-free study program. Founder and namesake of the institute is the SAP co-founder Hasso Plattner, who also heads the Enterprise Platform and Integration
-
How Connected Are Our Conference Review Boards? ACM SIGMOD Rec. (IF 1.1) Pub Date : 2023-01-25 Sourav S. Bhowmick
Dense co-authorship network formed by the review board members of a conference may adversely impact the quality and integrity of the review process. In this report, we shed light on the topological characteristics of such networks for three major data management conference venues. Our results show all these venues give rise to dense networks with a large giant component. We advocate to rethink the
-
Characterizing I/O in Machine Learning with MLPerf Storage ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Oana Balmau
Data is the driving force behind machine learning (ML) algorithms. The way we ingest, store, and serve data can impact the performance of end-to-end training and inference significantly [11]. However, efficient storage and pre-processing of training data has received far less focus in ML compared to efforts in building specialized software frameworks and hardware accelerators. The amount of data that
-
Counting the Answers to a Query ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, Cristian Riveros
Counting the answers to a query is a fundamental problem in databases, with several applications in the evaluation, optimization, and visualization of queries. Unfortunately, counting query answers is a #P-hard problem in most cases, so it is unlikely to be solvable in polynomial time. Recently, new results on approximate counting have been developed, specifically by showing that some problems in automata
-
A Survey of Data Marketplaces and Their Business Models ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Santiago Andrés Azcoitia, Nikolaos Laoutaris
Data is becoming an indispensable production factor for the modern economy, matching or exceeding in importance traditional factors such as land, infrastructure, labor and capital. As part of this, a wide range of applications in different sectors require huge amounts of information to feed machine learning models and algorithms responsible for critical roles in production chains and business processes
-
Revisiting Online Data Markets in 2022: A Seller and Buyer Perspective: ACM SIGMOD Record: Vol 51, No 3 ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Javen Kennedy, Pranav Subramaniam, Sainyam Galhotra, Raul Castro Fernandez
Well-functioning data markets match sellers with buyers to allocate data effectively. Although most of today's data markets fall short of this ideal, there is a renewed interest in online data marketplaces that may fulfill the promise of data markets. In this paper, we survey participants of some of the most common data marketplaces to understand the platforms' upsides and downsides. We find that buyers
-
Reminiscences on Influential Papers ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Fatma Özcan, Yuanyuan Tian
This issue's contributors are two women who have been extremely influential mentors in my life. I feel privileged that I had the chance to work with them during my time at IBM Almaden. So I asked what influenced them. Enjoy reading!
-
The Formidable Mid-Career Crisis ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Anastasia Ailamaki
My high school grades were top except for one subject: composition. Free text was (and still is) my absolute nightmare. After high school I only had to do technical writing, which is much easier: it boils down to math. Fact, supporting evidence, implication, which leads to another fact, repeat. So, when Tamer asked me to write a piece about mid-career challenges, I was excited at first, and then I
-
Database Education at UC San Diego ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Arun Kumar, Alin Deutsch, Amarnath Gupta, Yannis Papakonstantinou, Babak Salimi, Victor Vianu
We are in the golden age of data-intensive computing. CS is now the largest major in most US universities. Data Science, ML/AI, and cloud computing have been growing rapidly. Many new data-centric job categories are taking shape in industry, e.g., data scientists, ML engineers, analytics engineers, and data associates. The DB/data management/data systems area is naturally a central part of all these
-
Query Optimizer as a Service: An Idea Whose Time Has Come!: ACM SIGMOD Record: Vol 51, No 3 ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Alekh Jindal, Jyoti Leeka
Query optimization is a critical technology that is common across all modern data processing systems. However, it is traditionally implemented in silos and is deeply embedded in different systems. Furthermore, over the years, query optimizers have become less understood and rarely touched pieces of code that are brittle to changes and very expensive to maintain, thus slowing down the pace of innovation
-
VLDB Scalable Data Science Category: The Inaugural Year: ACM SIGMOD Record: Vol 51, No 3 ACM SIGMOD Rec. (IF 1.1) Pub Date : 2022-11-21 Arun Kumar, Alon Halevy, Nesime Tatbul
As part of the International Conference on Very Large Data Bases (VLDB) 2021 / Proceedings of the VLDB Endowment Volume 14, a new Research Track category named Scalable Data Science (SDS) was launched [2, 6]. The goal of SDS is to attract cutting-edge and impactful real-world work in the scalable data science arena to enhance the impact and visibility of the VLDB community on data science practice