-
A Data Science approach analysing the Impact of Injuries on Basketball Player and Team Performance Inform. Syst. (IF 2.466) Pub Date : 2021-02-22 Vangelis Sarlis; Vasilis Chatziilias; Christos Tjortjis; Dimitris Mandalidis
The sports industry utilizes science to improve short to long-term team and player management regarding budget, health, tactics, training, and most importantly performance. Data Science (DS) and Sports Analytics play key roles in supporting teams, players and experts to improve performance. This paper reviews the literature to identify important attributes correlated with injuries and attempts to quantify
-
Relational schema optimization for RDF-based knowledge graphs Inform. Syst. (IF 2.466) Pub Date : 2021-03-03 George Papastefanatos; Marios Meimaris; Panos Vassiliadis
Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answering complex query workloads in the presence of schema
-
Rank/select queries over mutable bitmaps Inform. Syst. (IF 2.466) Pub Date : 2021-03-03 Giulio Ermanno Pibiri; Shunsuke Kanda
The problem of answering rank/select queries over a bitmap is of utmost importance for many succinct data structures. When the bitmap does not change, many solutions exist in the theoretical and practical side. In this work we consider the case where one is allowed to modify the bitmap via a flip (i) operation that toggles its ith bit. By adapting and properly extending some results concerning prefix-sum
-
GoFast: Graph-based optimization for efficient and scalable query evaluation Inform. Syst. (IF 2.466) Pub Date : 2021-02-17 Ishaq Zouaghi; Amin Mesmoudi; Jorge Galicia; Ladjel Bellatreche; Taoufik Aguili
The popularity of the Resource Description Framework (RDF) and SPARQL has thrust the development of high-performance systems to manage data represented with this model. Former approaches adapted the well-established relational model applying its storage, query processing, and optimization strategies. However, the borrowed techniques from the relational model are not universally applicable in the RDF
-
Empowering conformance checking using Big Data through horizontal decomposition Inform. Syst. (IF 2.466) Pub Date : 2021-02-18 Álvaro Valencia-Parra; Ángel Jesús Varela-Vaca; María Teresa Gómez-López; Josep Carmona; Robin Bergenthum
Conformance checking unleashes the full power of process mining: techniques from this discipline enable the analysis of the quality of a process model through the discovery of event data, the identification of potential deviations, and the projection of real traces onto process models. In this way, the insights gained from the available event data can be transferred to a richer conceptual level, amenable
-
Rating prediction based on combination of review mining and user preference analysis Inform. Syst. (IF 2.466) Pub Date : 2021-02-14 Chin-Hui Lai; Chia-Yu Hsu
Review websites allow users to share their reviews of products or businesses, give ratings to products or businesses, and interact with other users. Due to the rapid growth of online review data, users face the problem of information overload. To resolve this problem, many researches have proposed various recommendation methods based on the analysis of users’ ratings. Besides user ratings, the review
-
COOL: A framework for conversational OLAP Inform. Syst. (IF 2.466) Pub Date : 2021-02-23 Matteo Francia; Enrico Gallinucci; Matteo Golfarelli
The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this paper, we introduce COOL, a framework devised for COnversational OLap applications. COOL interprets and translates a natural language dialogue into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query
-
A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter Inform. Syst. (IF 2.466) Pub Date : 2021-02-12 Reem Alharthi; Areej Alhothali; Kawthar Moria
Social networks have generated immense amounts of data that have been successfully utilized for research and business purposes. The approachability and immediacy of social media have also allowed ill-intentioned users to perform several harmful activities that include spamming, promoting, and phishing. These activities generate massive amounts of low-quality content that often exhibits duplicate, automated
-
Query-centric regression Inform. Syst. (IF 2.466) Pub Date : 2021-02-14 Qingzhi Ma; Peter Triantafillou
Regression Models (RMs) and Machine Learning models (ML) in general, aim to offer high prediction accuracy, even for unforeseen queries/datasets. This depends on their fundamental ability to generalize. However, overfitting a model, with respect to the current DB state, may be best suited to offer excellent accuracy. This overfit-generalize divide bears many practical implications faced by a data analyst
-
Privacy-preserving location data stream clustering on mobile edge computing and cloud Inform. Syst. (IF 2.466) Pub Date : 2021-02-10 Veronika Stephanie; M.A.P. Chamikara; Ibrahim Khalil; Mohammed Atiquzzaman
The advancements in positioning technologies have led to the emergence of various location-based services, resulting in a drastic increase in location-based data generation, producing big-data. Location data are often linked with user privacy, as they can reveal sensitive information such as the places visited by a person. Moreover, most location-based services involve resource-constrained devices
-
The FORA Fog Computing Platform for Industrial IoT Inform. Syst. (IF 2.466) Pub Date : 2021-02-02 Paul Pop; Bahram Zarrin; Mohammadreza Barzegaran; Stefan Schulte; Sasikumar Punnekkat; Jan Ruh; Wilfried Steiner
Industry 4.0 will only become a reality through the convergence of Operational and Information Technologies (OT & IT), which use different computation and communication technologies. Cloud Computing cannot be used for OT involving industrial applications, since it cannot guarantee stringent non-functional requirements, e.g., dependability, trustworthiness and timeliness. Instead, a new computing paradigm
-
Stochastic process mining: Earth movers’ stochastic conformance Inform. Syst. (IF 2.466) Pub Date : 2021-02-06 Sander J.J. Leemans; Wil M.P. van der Aalst; Tobias Brockhoff; Artem Polyvyanyy
Initially, process mining focused on discovering process models from event data, but in recent years the use and importance of conformance checking has increased. Conformance checking aims to uncover differences between a process model and an event log. Many conformance checking techniques and measures have been proposed. Typically, these take into account the frequencies of traces in the event log
-
Data variety, come as you are in multi-model data warehouses Inform. Syst. (IF 2.466) Pub Date : 2021-02-04 Sandro Bimonte; Enrico Gallinucci; Patrick Marcel; Stefano Rizzi
Multi-model DBMSs (MMDBMSs) have been recently introduced to store and seamlessly query heterogeneous data (structured, semi-structured, graph-based, etc.) in their native form, aimed at effectively preserving their variety. Unfortunately, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying
-
Conformance checking of partially matching processes: An entropy-based approach Inform. Syst. (IF 2.466) Pub Date : 2021-01-20 Artem Polyvyanyy; Anna Kalenkova
Conformance checking is an area of process mining that studies methods for measuring and characterizing commonalities and discrepancies between processes recorded in event logs of IT-systems and designed processes, either captured in explicit process models or implicitly induced by information systems. Applications of conformance checking range from measuring the quality of models automatically discovered
-
Applying Reinforcement Learning towards automating energy efficient virtual machine consolidation in cloud data centers Inform. Syst. (IF 2.466) Pub Date : 2021-01-19 Rachael Shaw; Enda Howley; Enda Barrett
Energy awareness presents an immense challenge for cloud computing infrastructure and the development of next generation data centers. Virtual Machine (VM) consolidation is one technique that can be harnessed to reduce energy related costs and environmental sustainability issues of data centers. In recent times intelligent learning approaches have proven to be effective for managing resources in cloud
-
Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training Inform. Syst. (IF 2.466) Pub Date : 2021-01-18 Dezhao Song; Andrew Vold; Kanika Madan; Frank Schilder
Multi-label document classification has a broad range of applicability to various practical problems, such as news article topic tagging, sentiment analysis, medical code classification, etc. A variety of approaches (e.g., tree-based methods, neural networks and deep learning systems that are specifically based on pre-trained language models) have been developed for multi-label document classification
-
Fast, scalable and geo-distributed PCA for big data analytics Inform. Syst. (IF 2.466) Pub Date : 2021-01-06 T. M. Tariq Adnan; Md. Mehrab Tanjim; Muhammad Abdullah Adnan
Principal Component Analysis (PCA) is a widely popular technique for reducing the dimensionality of a dataset. Interestingly, when dimensions of the dataset grow too large, existing state-of-the-art methods for PCA face scalability issue due to the explosion of intermediate data. Moreover, in a geographically distributed environment where most of today’s data are originally generated, these methods
-
Anti-alignments—Measuring the precision of process models and event logs Inform. Syst. (IF 2.466) Pub Date : 2020-12-26 Thomas Chatain; Mathilde Boltenhagen; Josep Carmona
Processes are a crucial artifact in organizations, since they coordinate the execution of activities so that products and services are provided. The use of models to analyze the underlying processes is a well-known practice. However, due to the complexity and continuous evolution of their processes, organizations need an effective way of analyzing the relation between processes and models. Conformance
-
Efficient subspace search in data streams Inform. Syst. (IF 2.466) Pub Date : 2020-12-17 Edouard Fouché; Florian Kalinke; Klemens Böhm
In the real world, data streams are ubiquitous — think of network traffic or sensor data. Mining patterns, e.g., outliers or clusters, from such data must take place in real time. This is challenging because (1) streams often have high dimensionality, and (2) the data characteristics may change over time. Existing approaches tend to focus on only one aspect, either high dimensionality or the specifics
-
SJSON: A succinct representation for JSON documents Inform. Syst. (IF 2.466) Pub Date : 2020-12-13 Junhee Lee; Edman Anjos; Srinivasa Rao Satti
The massive amounts of data processed in modern computational systems are becoming a problem of increasing importance. This data is commonly stored directly or indirectly through the use of data exchange languages, such as JSON (JavaScript Object Notation) and XML (eXtensible Markup Language), for human-readable platform-agnostic access. This paper focuses on exploring a set of succinct representations
-
Enterprise-grade protection against e-mail tracking Inform. Syst. (IF 2.466) Pub Date : 2020-12-17 Benjamin Fabian; Benedict Bender; Ben Hesseldieck; Johannes Haupt; Stefan Lessmann
E-mail tracking provides companies with fine-grained behavioral data about e-mail recipients, which can be a threat for individual privacy and enterprise security. This problem is especially severe since e-mail tracking techniques often gather data without the informed consent of the recipients. So far e-mail recipients lack a reliable protection mechanism. This article presents a novel protection
-
Computation of alignments of business processes through relaxation labelling and local optimal search Inform. Syst. (IF 2.466) Pub Date : 2020-12-16 Lluís Padró; Josep Carmona
A fundamental problem in conformance checking is aligning event data with process models. Unfortunately, existing techniques for this task are either complex, or can only be applicable to restricted classes of models. This in practice means that for large inputs, current techniques often fail to produce a result. In this paper we propose a method to compute alignments for unconstrained process models
-
On the composition of the long tail of business processes: Implications from a process mining study Inform. Syst. (IF 2.466) Pub Date : 2020-11-30 Marcus Fischer; Adrian Hofmann; Florian Imgrund; Christian Janiesch; Axel Winkelmann
Digital transformation forces companies to rethink their processes to meet current customer needs. Business Process Management (BPM) can provide the means to structure and tackle this change. However, most approaches to BPM face restrictions on the number of processes they can optimize at a time due to complexity and resource restrictions. Investigating this shortcoming, the concept of the long tail
-
Efficient top-k recently-frequent term querying over spatio-temporal textual streams Inform. Syst. (IF 2.466) Pub Date : 2020-12-05 Thu-Lan Dam; Sean Chester; Kjetil Nørvåg; Quang-Huy Duong
Massive amounts of data with spatio-temporal-textual information are being generated due to the proliferation of GPS-equipped mobile devices. Much of this data are social media posts, often used to share and spread personal updates and news. Exploring valuable information from a dynamic collection of social posts is of great interest and has attracted many studies. However, because the size of data
-
Aligning social concerns with information system security: A fundamental ontology for social engineering Inform. Syst. (IF 2.466) Pub Date : 2020-12-07 Tong Li; Xiaowei Wang; Yeming Ni
Along with the rapid development of socio-technical systems, people are playing an increasingly important role in information system and have actually become an essential system component. However, unlike technology-based attacks that have been investigated for decades, social engineering attacks have not been efficiently addressed. In particular, due to the interdisciplinary nature of social engineering
-
CryptDICE: Distributed data protection system for secure cloud data storage and computation Inform. Syst. (IF 2.466) Pub Date : 2020-10-30 Ansar Rafique; Dimitri Van Landuyt; Emad Heydari Beni; Bert Lagaisse; Wouter Joosen
Cloud storage allows organizations to store data at remote sites of service providers. Although cloud storage services offer numerous benefits, they also involve new risks and challenges with respect to data security and privacy aspects. To preserve confidentiality, data must be encrypted before outsourcing to the cloud. Although this approach protects the security and privacy aspects of data, it also
-
Conformance checking of mixed-paradigm process models Inform. Syst. (IF 2.466) Pub Date : 2020-11-26 Boudewijn F. van Dongen; Johannes De Smedt; Claudio Di Ciccio; Jan Mendling
Mixed-paradigm process models integrate strengths of procedural and declarative representations like Petri nets and Declare. They are specifically interesting for process mining because they allow capturing complex behaviour in a compact way. A key research challenge for the proliferation of mixed-paradigm models for process mining is the lack of corresponding conformance checking techniques. In this
-
ER-index: A referential index for encrypted genomic databases Inform. Syst. (IF 2.466) Pub Date : 2020-11-10 Ferdinando Montecuollo; Giovannni Schmid
Huge DBMSs storing genomic information are being created and engineerized for doing large-scale, comprehensive and in-depth analysis of human beings and their diseases. This paves the way for significant new approaches in medicine, but also poses major challenges for storing, processing and transmitting such big amounts of data in compliance with recent regulations concerning user privacy. We designed
-
Model-based trace variant analysis of event logs Inform. Syst. (IF 2.466) Pub Date : 2020-11-14 Mathilde Boltenhagen; Thomas Chatain; Josep Carmona
The comparison of trace variants of business processes opens the door for a fine-grained analysis of the distinctive features inherent in the executions of a process in an organization. The current approaches for trace variant analysis do not consider the situation where a process model is present, and therefore, it can guide the derivation of the trace variants by considering high-level structures
-
D2IA: User-defined interval analytics on distributed streams Inform. Syst. (IF 2.466) Pub Date : 2020-11-13 Ahmed Awad; Riccardo Tommasini; Samuele Langhi; Mahmoud Kamel; Emanuele Della Valle; Sherif Sakr
Nowadays, modern Big Stream Processing Solutions (e.g. Spark, Flink) are working towards being the ultimate framework for streaming analytics. In order to achieve this goal, they started to offer extensions of SQL that incorporate stream-oriented primitives such as windowing and Complex Event Processing (CEP). The former enables stateful computation on infinite sequences of data items while the latter
-
Novel predictive model to improve the accuracy of collaborative filtering recommender systems Inform. Syst. (IF 2.466) Pub Date : 2020-11-03 Bushra Alhijawi; Ghazi Al-Naymat; Nadim Obeid; Arafat Awajan
The recommendation problem involves the prediction of a set of items that maximize the utility for users. Numerous factors, such as the filtering method and similarity measure, affect the prediction accuracy. We propose a novel prediction mechanism that can be applied to collaborative filtering recommender systems. This prediction mechanism consists of a novel adaptable predictive model, called inheritance-based
-
ProDB: A memory-secure database using hardware enclave and practical oblivious RAM Inform. Syst. (IF 2.466) Pub Date : 2020-11-10 Ziyang Han; Haibo Hu
One key challenge for data owners to host their databases in the cloud is data privacy. In this paper, we first demonstrate that even with the most recent hardware-based security technology such as Intel SGX, a hypervisor can still sniff key database operations running in its guest virtual machine (VM) such as the frequency and type of SQL queries, by monitoring the access pattern of this VM’s main
-
Requirements Engineering for Cyber Physical Production Systems: The e-CORE approach and its application Inform. Syst. (IF 2.466) Pub Date : 2020-11-10 Pericles Loucopoulos; Evangelia Kavakli; Julien Mascolo
Traditional manufacturing and production systems are in the throes of a digital transformation. By blending the real and virtual production worlds, it is now possible to connect all parts of the production process: devices, products, processes, systems and people, in an informational ecosystem. This paper examines the underpinning issues that characterise the challenges for transforming traditional
-
Topical affinity in short text microblogs Inform. Syst. (IF 2.466) Pub Date : 2020-10-24 Herman Masindano Wandabwa; M. Asif Naeem; Farhaan Mirza; Russel Pears
Knowledge-based applications like recommender systems in social networks are powered by complex network of social discussions and user connections. Short text microblog platforms like Twitter are powerful in this aspect due to their real-time content dissemination as well as having a complex mesh of user connections. For example, users on Twitter tend to consume certain content to a greater or less
-
Orientation and conformance: A HMM-based approach to online conformance checking Inform. Syst. (IF 2.466) Pub Date : 2020-11-07 Wai Lam Jonathan Lee; Andrea Burattin; Jorge Munoz-Gama; Marcos Sepúlveda
Online conformance checking comes with new challenges, especially in terms of time and space constraints. One fundamental challenge of explaining the conformance of a running case is in balancing between making sense at the process level as the case reaches completion and putting emphasis on the current information at the same time. In this paper, we propose an online conformance checking framework
-
Sampling and approximation techniques for efficient process conformance checking Inform. Syst. (IF 2.466) Pub Date : 2020-10-26 Martin Bauer; Han van der Aa; Matthias Weidlich
Conformance checking enables organizations to automatically assess whether their business processes are executed according to their specification. State-of-the-art conformance checking algorithms perform this task by establishing alignments between behaviour recorded by IT systems to a process model capturing desired behaviour. While such alignments clearly highlight conformance issues, a major downside
-
Querying APIs with SPARQL Inform. Syst. (IF 2.466) Pub Date : 2020-10-26 Matthieu Mosser; Fernando Pieressa; Juan L. Reutter; Adrián Soto; Domagoj Vrgoč
Although the amount of RDF data has been steadily increasing over the years, the majority of information on the Web is still residing in other formats, and is often not accessible to Semantic Web services. A lot of this data is available through APIs serving JSON documents. In this work we propose a way of extending SPARQL with the option to consume JSON APIs and integrate this information into SPARQL
-
Scalable and data-aware SQL query recommendations Inform. Syst. (IF 2.466) Pub Date : 2020-09-18 Natalia Arzamasova; Klemens Böhm
SQL query recommendation suggests an SQL statement to a user, based on his submitted requests and on queries of other users stored in a log. Such methods need to be scalable and data-aware. Data awareness means that the filtering condition, the most crucial element of the recommendation, contains actual values. Otherwise, the query is not directly executable. Existing approaches do not satisfy the
-
A large reproducible benchmark of ontology-based methods and word embeddings for word similarity Inform. Syst. (IF 2.466) Pub Date : 2020-09-30 Juan J. Lastra-Díaz; Josu Goikoetxea; Mohamed Ali Hadj Taieb; Ana Garcia-Serrano; Mohamed Ben Aouicha; Eneko Agirre; David Sánchez
This work is a companion reproducibility paper of the experiments and results reported in Lastra-Diaz et al. (2019a), which is based on the evaluation of a companion reproducibility dataset with the HESML V1R4 library and the long-term reproducibility tool called Reprozip. Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorization, memory
-
A general framework for privacy-preserving of data publication based on randomized response techniques Inform. Syst. (IF 2.466) Pub Date : 2020-09-29 Chaobin Liu; Shixi Chen; Shuigeng Zhou; Jihong Guan; Yao Ma
Privacy preserving is a paramount concern in publishing datasets that contain sensitive information. Preventing privacy disclosure and providing useful information to legitimate users for data analyzing/mining are conflicting goals. Randomized response is a class of techniques that perturbs each sensitive value in a certain way, so that personal privacy is protected while the large-trend of the entire
-
Feature-oriented engineering of declarative artifact-centric process models Inform. Syst. (IF 2.466) Pub Date : 2020-09-10 Rik Eshuis
Declarative artifact-centric process models are suitable for specifying knowledge-intensive processes. Currently, such models need to be designed from scratch, even though existing model fragments could be reused to gain efficiency in designing and maintaining declarative artifact-centric process models. To address this problem, this paper proposes an approach for composing model fragments, abstracted
-
A knowledge-intensive adaptive business process management framework Inform. Syst. (IF 2.466) Pub Date : 2020-09-10 Huseyin Kir; Nadia Erdogan
Business process management has been the driving force of optimization and operational efficiency for companies until now, but the digitalization era we have been experiencing requires businesses to be agile and responsive as well. In order to be a part of this digital transformation, delivering new levels of automation-fueled agility through digitalization of BPM itself is required. However, the automation
-
Cause vs. effect in context-sensitive prediction of business process instances Inform. Syst. (IF 2.466) Pub Date : 2020-09-14 Jens Brunk; Matthias Stierle; Leon Papke; Kate Revoredo; Martin Matzner; Jörg Becker
Predicting undesirable events during the execution of a business process instance provides the process participants with an opportunity to intervene and keep the process aligned with its goals. Few approaches for tackling this challenge consider a multi-perspective view, where the flow perspective of the process is combined with its surrounding context. Given the many sources of data in today’s world
-
Detection of batch activities from event logs Inform. Syst. (IF 2.466) Pub Date : 2020-09-10 Niels Martin; Luise Pufahl; Felix Mannhardt
Organizations carry out a variety of business processes in order to serve their clients. Usually supported by information technology and systems, process execution data is logged in an event log. Process mining uses this event log to discover the process’ control-flow, its performance, information about the resources, etc. A common assumption is that the cases are executed independently of each other
-
On the appropriateness of Platt scaling in classifier calibration Inform. Syst. (IF 2.466) Pub Date : 2020-09-10 Björn Böken
Many applications using data mining and machine learning techniques require posterior probability estimates besides often highly accurate predictions. Classifier calibration is a separate branch of machine learning that aims at transforming classifier predictions into posterior class probabilities and thus are useful additional extensions in the respective applications. Among the existing state-of-the-art
-
Controlled flexibility in blockchain-based collaborative business processes Inform. Syst. (IF 2.466) Pub Date : 2020-08-29 Orlenys López-Pintado; Marlon Dumas; Luciano García-Bañuelos; Ingo Weber
Blockchain technology enables the execution of collaborative business processes involving mutually untrusted parties. Existing tools allow such processes to be modeled using high-level notations and compiled into smart contracts that can be deployed on blockchain platforms. However, these tools do not provide mechanisms to cope with the flexibility requirements inherent to open and dynamic collaboration
-
Towards holistic Entity Linking: Survey and directions Inform. Syst. (IF 2.466) Pub Date : 2020-08-24 Italo L. Oliveira; Renato Fileto; René Speck; Luís P.F. Garcia; Diego Moussallem; Jens Lehmann
Entity Linking (EL) empowers Natural Language Processing applications by linking relevant mentions found in raw textual data to precise information about what they supposedly stand for. However, EL approaches have mostly focused on particular kinds of inputs and frequently fail to properly handle texts from specific sources (e.g., microblogs) that have particularities such as grammatical errors, slangs
-
Collaborative filtering over evolution provenance data for interactive visual data exploration Inform. Syst. (IF 2.466) Pub Date : 2020-08-18 Houssem Ben Lahmar, Melanie Herschel
In interactive visual data exploration, users rely on recommendations on what data to explore next. EVLIN is a system that recommends queries to retrieve these data for the next exploration step, paired with suited visualizations. This paper extends EVLIN by combining its content-based recommendations with recommendations leveraging collaborative filtering to improve the effectiveness of recommendation-based
-
OILog: An online incremental log keyword extraction approach based on MDP-LSTM neural network Inform. Syst. (IF 2.466) Pub Date : 2020-08-14 Xiaoyu Duan, Shi Ying, Hailong Cheng, Wanli Yuan, Xiang Yin
Log keyword extraction is an indispensable part of log anomaly detection. There are two main challenges in keyword extraction, one is that the essence of logs is unstructured, and different vendors usually define different log formats, the other one is that the most of the traditional method cannot update the log keywords incrementally to match the newly generated log data, so the extraction accuracy
-
In-situ visual exploration over big raw data Inform. Syst. (IF 2.466) Pub Date : 2020-08-07 Nikos Bikakis, Stavros Maroulis, George Papastefanatos, Panos Vassiliadis
Data exploration and visual analytics systems are of great importance in Open Science scenarios, where less tech-savvy researchers wish to access and visually explore big raw data files (e.g., json, csv) generated by scientific experiments using commodity hardware and without being overwhelmed in the tedious processes of data loading, indexing and query optimization. In this paper, we present our work
-
Every apprentice needs a master: Feedback-based effectiveness improvements for process model matching Inform. Syst. (IF 2.466) Pub Date : 2020-08-04 Christopher Klinkmüller, Ingo Weber
Process models are a central element of modern business process management technology. When adopting such technology, organizations inevitably establish process model collections which, depending on the degree of adoption, can reach sizes of thousands of models. Process model matching techniques are intended to assist experts in the management of such large collections, e.g., in querying the collections
-
Knowledge-guided unsupervised rhetorical parsing for text summarization Inform. Syst. (IF 2.466) Pub Date : 2020-08-03 Shengluan Hou, Ruqian Lu
Automatic text summarization (ATS) has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale corpora. However, there is still no guarantee that the generated summaries are grammatical, concise, and convey all salient information as the original documents have. To make the summarization results more faithful, this paper presents an unsupervised
-
XChange: A semantic diff approach for XML documents Inform. Syst. (IF 2.466) Pub Date : 2020-08-01 Alessandreia Oliveira, Troy Kohwalter, Marcos Kalinowski, Leonardo Murta, Vanessa Braganholo
XML documents are extensively used in several applications and evolve over time. Identifying the semantics of these changes becomes a fundamental process to understand their evolution. Existing approaches related to understanding changes (diff) in XML documents focus only on syntactic changes. These approaches compare XML documents based on their structure, without considering the associated semantics
-
Privacy-aware data cleaning-as-a-service Inform. Syst. (IF 2.466) Pub Date : 2020-07-31 Yu Huang, Mostafa Milani, Fei Chiang
Data cleaning is a pervasive problem for organizations as they try to reap value from their data. Recent advances in networking and cloud computing technology have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. In this paper, we consider a Data Cleaning-as-a-Service model that allows a client to interact with a data
-
Exploiting semantic relationships for unsupervised expansion of sentiment lexicons Inform. Syst. (IF 2.466) Pub Date : 2020-07-29 Felipe Viegas, Mário S. Alvim, Sérgio Canuto, Thierson Rosa, Marcos André Gonçalves, Leonardo Rocha
The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption stems from the fact that words considered to be “close” in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces
-
DimensionSlice: A main-memory data layout for fast scans of multidimensional data Inform. Syst. (IF 2.466) Pub Date : 2020-07-25 Ilhyun Suh, Yon Dohn Chung
Multidimensional data are exploited in many application areas such as scientific data analysis, business intelligence, and geographic information systems. One of the most frequent operations applied to such multidimensional data is the selection of a subspace of the given multidimensional space, which involves predicate evaluation on multiple dimensions. Existing main-memory data layouts optimized
-
Fragments of bag relational algebra: Expressiveness and certain answers Inform. Syst. (IF 2.466) Pub Date : 2020-07-22 Marco Console; Paolo Guagliardo; Leonid Libkin
While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization
-
Relevance- and interface-driven clustering for visual information retrieval Inform. Syst. (IF 2.466) Pub Date : 2020-07-13 Mohamed Reda Bouadjenek, Scott Sanner, Yihao Du
Search results of spatio-temporal data are often displayed on a map, but when the number of matching search results is large, it can be time-consuming to individually examine all results, even when using methods such as filtered search to narrow the content focus. This suggests the need to aggregate results via a clustering method. However, standard unsupervised clustering algorithms like K-means (i)
-
Decentralized data access control over consortium blockchains Inform. Syst. (IF 2.466) Pub Date : 2020-07-09 Yaoliang Chen, Shi Chen, Jiao Liang, Lance Warren Feagan, Weili Han, Sheng Huang, X. Sean Wang
Blockchain is an emerging data management technology that enables people in a collaborative network to establish trusted connections with the other participants. Recently consortium blockchains have raised interest in a broader blockchain technology discussion. Instead of a fully public, autonomous network, consortium blockchain supports a network where participants can be limited to a subset of users
-
Providing accurate answers to OLAP queries based on standardized moments of data cubes Inform. Syst. (IF 2.466) Pub Date : 2020-07-08 Elaheh Pourabbas
In this paper, we focus on the problem of providing accurate estimates to a target data cube from sets of source data cubes, which share the same summary measures. We investigate the acyclic and cyclic schemas of data sources and show that the more accurate target data cube can be computed on the basis of third and fourth standardized moments (i.e., skewness and kurtosis, respectively) of the source
Contents have been reproduced by permission of the publishers.