当前期刊: Information Systems Go to current issue    加入关注   
显示样式:        排序: 导出
  • Understanding Service-Oriented Architecture (SOA): A systematic literature review and directions for further investigation
    Inform. Syst. (IF 2.066) Pub Date : 2020-01-17
    Naghmeh Niknejad; Waidah Ismail; Imran Ghani; Behzad Nazari; Mahadi Bahari; Ab Razak Bin Che Hussin

    Service-Oriented Architecture (SOA) has emerged as an architectural approach that enhances the service delivery performance of existing traditional systems while still retaining their most important features. This approach, due to its flexibility of adoption, has gained the attention of both academic and business entities, especially in the development of world-leading technologies such as Cloud Computing (CC) and the Internet of Things (IoT). Although many studies have listed the success factors of SOA, a few minor failures have also been reported in the literature. Despite the availability of rich material on SOA, there is a lack of systematic reviews covering the different aspects of the SOA concept in Information Systems (IS) research. Therefore, the central objective of this study is to review existing issues of SOA and share the findings with the academia. Hence, a systematic literature review (SLR) was conducted to analyse existing studies related to SOA and the factors that led to SOA success and failure from 2009 to 2019. To completely cover all SOA-related research in the IS field, a two-stage review protocol that included automatic and manual searching was applied, resulting in 103 primary studies. The articles were categorised into four research themes, namely: SOA Adoption, SOA Concepts, SOA Impact, and SOA Practice. The result shows that the academic research interest on SOA increased recently with most of the articles covering SOA Practice followed by SOA Adoption. Moreover, the findings of this review highlighted SOA Governance, SOA Strategy, Financial Issues and Costs, and Education and Training as the most significant factors of SOA adoption and implementation. Consequently, the outcomes will assist professionals and experts in organisations as well as academic researchers to focus more on these factors for successfully adopting and implementing SOA.

  • Discovering and merging related analytic datasets
    Inform. Syst. (IF 2.066) Pub Date : 2020-01-17
    Rutian Liu; Eric Simon; Bernd Amann; Stéphane Gançarski

    The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, despite the profusion of available datasets, it remains quite difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. This article describes a model and algorithms that exploit automatically extracted and user-defined semantic relationships for extending analytic datasets with new atomic or aggregated attribute values. Our framework is implemented as a REST service in SAP HANA and includes a careful theoretical analysis and practical solutions for several complex data quality issues.

  • Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods
    Inform. Syst. (IF 2.066) Pub Date : 2020-01-15
    Tie Li; Gang Kou; Yi Peng

    In malicious URLs detection, traditional classifiers are challenged because the data volume is huge, patterns are changing over time, and the correlations among features are complicated. Feature engineering plays an important role in addressing these problems. To better represent the underlying problem and improve the performance of classifiers in identifying malicious URLs, this paper proposed a combination of linear and non-linear space transformation methods. For linear transformation, a two-stage distance metric learning approach was developed: first, singular value decomposition was performed to get an orthogonal space, and then a linear programming was used to solve an optimal distance metric. For nonlinear transformation, we introduced Nyström method for kernel approximation and used the revised distance metric for its radial basis function such that the merits of both linear and non-linear transformations can be utilized. 331622 URLs with 62 features were collected to validate the proposed feature engineering methods. The results showed that the proposed methods significantly improved the efficiency and performance of certain classifiers, such as k-Nearest Neighbor, Support Vector Machine, and neural networks. The malicious URLs’ identification rate of k-Nearest Neighbor was increased from 68% to 86%, the rate of linear Support Vector Machine was increased from 58% to 81%, and the rate of Multi-Layer Perceptron was increased from 63% to 82%. We also developed a website to demonstrate a malicious URLs detection system which uses the methods proposed in this paper. The system can be accessed at: http://url.jspfans.com.

  • A comprehensive analysis of delayed insertions in metric access methods
    Inform. Syst. (IF 2.066) Pub Date : 2020-01-11
    Humberto Razente; Maria Camila N. Barioni; Regis M. Santos Sousa

    Similarity queries are fundamental operations for applications that deal with complex data. This paper presents MIA (Metric Indexing Assisted by auxiliary memory with limited capacity), a new delayed insertion approach that can be employed to create enhanced dynamic metric access methods through short-term memories. We present a comprehensive evaluation of delayed insertion methods for metric access methods while comparing MIA to dynamic forced reinsertions. Our experimental results show that metric access methods can benefit from these strategies, decreasing the node overlap, the number of distance calculations, the number of disk accesses, and the execution time to run k-nearest neighbor queries.

  • What does existing NeuroIS research focus on?
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-28
    Jie Xiong; Meiyun Zuo

    NeuroIS is a research field in which neuroscience theories and tools are used to better understand information systems phenomena. At present, NeuroIS is still an emerging field in information systems, and the number of available studies is limited. Because researchers who plan or execute NeuroIS research need to understand the status of the existing empirical research published in relevant journals, we have analyzed 78 empirical articles and put forward an integrative framework for understanding what existing NeuroIS research focuses on. Our framework is built upon stimulus–organism–response theory, which explains that stimulus factors can affect users’ psychological processes, which further lead to their responses. Then, we review the collected articles and summarize their findings to give more details of NeuroIS studies. Through this literature review, we identify several opportunities for future NeuroIS research in terms of influencing factors, measurement instruments, and subjects. We believe that our work will provide some meaningful insight for future NeuroIS research.

  • Comparing the expressiveness of downward fragments of the relation algebra with transitive closure on trees
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-08
    Jelle Hellings; Marc Gyssens; Yuqing Wu; Dirk Van Gucht; Jan Van den Bussche; Stijn Vansummeren; George H.L. Fletcher

    Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, projections, coprojections, intersection, and difference; this for Boolean queries and path queries on labeled and unlabeled structures. In all cases, we present the complete Hasse diagram. In particular, we establish, for each query language fragment that we study on trees, whether it is closed under difference and intersection.

  • Volunteering for Linked Data Wrapper maintenance: A platform perspective
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-08
    Iker Azpeitia; Jon Iturrioz; Oscar Díaz

    Linked Data Wrappers (LDWs) turn Web APIs into RDF end-points, leveraging the Linked Open Data cloud with current data. Unfortunately, LDWs are fragile upon upgrades on the underlying APIs, compromising LDW stability. Hence, for API-based LDWs to become a sustainable foundation for the Web of Data, we should recognize LDW maintenance as a continuous effort that outlives their breakout projects. This is not new in Software Engineering. Other projects in the past faced similar issues. The strategy: becoming open source and turning towards dedicated platforms. By making LDWs open, we permit others not only to inspect (hence, increasing trust and consumption), but also to maintain (to cope with API upgrades) and reuse (to adapt for their own purposes). Promoting consumption, adaptation and reuse might all help to increase the user base, and in so doing, might provide the critical mass of volunteers, current LDW projects lack. Drawing upon the Helping Theory, we investigate three enablers of volunteering applied to LDW maintenance: impetus to respond, positive evaluation of contributing and increasing awareness. Insights are fleshed out through SYQL, a LDW platform on top of Yahoo’s YQL. Specifically, SYQL capitalizes on the YQL community (i.e. impetus to respond), providesannotation overlays to easy participation (i.e. positive evaluation of contributing), and introduces aHealth Checker (i.e. increasing awareness). Evaluation is conducted for 12 subjects involved in maintaining someone else’s LDWs. Results indicate that both the Health Checker and the annotation overlays provide utility as enablers of awareness and contribution.

  • Operator implementation of Result Set Dependent KWS scoring functions
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-18
    Vinay M.S.; Jayant R. Haritsa

    A popular approach to hosting Keyword Search Systems (KWS) on relational DBMS platforms is to employ the Candidate Network framework. The quality of a Candidate Network-based search is critically dependent on the scoring function used to rank the relevant answers. In this paper, we first demonstrate, through detailed empirical and conceptual analysis studies, that the Labrador scoring function provides the best user relevance among contemporary Candidate Network scoring functions. Efficiently incorporating the Labrador function, however, is rendered difficult due to its Result Set Dependent (RSD) characteristic, wherein the distribution of keywords in the query results influences the ranking. To address this RSD challenge ►We investigate two mechanisms ►(a) a simple wrapper approach that leverages existing RDBMS functionalities through an SQL wrapper ►And (b) a more sophisticated operator approach wherein the database engine is augmented with custom operators that perform result ranking in the query execution plan. The above strategies have been implemented on a PostgreSQL codebase, inclusive of integration with the optimizer for the operator approach. A detailed empirical study over real-world data sets, including DBLP and Wikipedia, indicates that the wrapper approach addresses the RSD efficiency issue to a limited extent only. More encouragingly, the operator approach is extremely successful, delivering processing times that are comparable to, or better than, those of non-RSD implementations. We expect these results to aid in the organic hosting of KWS functionality on database systems.

  • To index or not to index: Time–space trade-offs for positional ranking functions in search engines
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-14
    Diego Arroyuelo; Senén González; Mauricio Marin; Mauricio Oyarzún; Torsten Suel; Luis Valenzuela

    Positional ranking functions, widely used in web search engines and related search systems, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time–space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether positional data should be indexed, and how. We show that there is a wide range of practical time–space trade-offs. Moreover, we show that using about 1.30 times the space of positional data, we can store everything needed for efficient query processing, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

  • JSON: Data model and query languages
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-06
    Pierre Bourhis; Juan L. Reutter; Domagoj Vrgoč

    Despite the fact that JSON is currently one of the most popular formats for exchanging data on the Web, there are very few studies on this topic and there is no agreement upon a theoretical framework for dealing with JSON. Therefore in this paper we propose a formal data model for JSON documents and, based on the common features present in available systems using JSON, we define a lightweight query language allowing us to navigate through JSON documents, study the complexity of basic computational tasks associated with this language, and compare its expressive power with practical languages for managing JSON data.

  • Discovering instance and process spanning constraints from process execution logs
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-09
    Karolin Winter; Florian Stertz; Stefanie Rinderle-Ma

    Instance spanning constraints (ISC) are the instrument to establish controls across multiple instances of one or several processes. A multitude of applications crave for ISC support. Consider, for example, the bundling and unbundling of cargo across several instances of a logistics process or dependencies between examinations in different medical treatment processes. Non-compliance with ISC can lead to severe consequences and penalties, e.g., dangerous effects due to undesired drug interactions. ISC might stem from regulatory documents, extracted by domain experts. Another source for ISC are process execution logs. Process execution logs store execution information for process instances, and hence, inherently, the effects of ISC. Discovering ISC from process execution logs can support ISC design and implementation (if the ISC was not known beforehand) and the validation of the ISC during its life time. This work contributes a categorization of ISC as well as four discovery algorithms for ISC candidates from process execution logs. The discovered ISC candidates are put into context of the associated processes and can be further validated with domain experts. The algorithms are prototypically implemented and evaluated based on artificial and real-world process execution logs. The results facilitate ISC design as well as validation and hence contribute to a digitalized ISC and compliance management.

  • An empirical evaluation of exact set similarity join techniques using GPUs
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-13
    Christos Bellas; Anastasios Gounaris

    Exact set similarity join is a notoriously expensive operation, for which several solutions have been proposed. Recently, there have been studies that present a comparative analysis using MapReduce or a non-parallel setting. Our contribution is that we complement these works through conducting a thorough evaluation of the state-of-the-art GPU-enabled techniques. These techniques are highly diverse in their key features and our experiments manage to reveal the key strengths of each one. As we explain, in real-life applications there is no dominant solution. Depending on specific dataset and query characteristics, each solution, even not using the GPU at all, has its own sweet spot. All our work is repeatable and extensible.

  • Automated discovery of declarative process models with correlated data conditions
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-09
    Volodymyr Leno; Marlon Dumas; Fabrizio Maria Maggi; Marcello La Rosa; Artem Polyvyanyy

    Automated process discovery techniques enable users to generate business process models from event logs extracted from enterprise information systems. Traditional techniques in this field generate procedural process models (e.g., in the BPMN notation). When dealing with highly variable processes, the resulting procedural models are often too complex to be practically usable. An alternative approach is to discover declarative process models, which represent the behavior of the process as a set of constraints. Declarative process discovery techniques have been shown to produce simpler models than procedural ones, particularly for processes with high variability. However, the bulk of approaches for automated discovery of declarative process models focus on the control-flow perspective, ignoring the data perspective. This paper addresses the problem of discovering declarative process models with data conditions. Specifically, the paper tackles the problem of discovering constraints that involve two activities of the process such that each of these two activities is associated with a condition that must hold when the activity occurs. The paper presents and compares two approaches to the problem of discovering such conditions. The first approach uses clustering techniques in conjunction with a rule mining technique, while the second approach relies on redescription mining techniques. The two approaches (and their variants) are empirically compared using a combination of synthetic and real-life event logs. The experimental results show that the former approach outperforms the latter when it comes to re-discovering constraints artificially injected in a log. Also, the former approach is in most of the cases more computationally efficient. On the other hand, redescription mining discovers rules with higher confidence (and lower support) suggesting that it may be used to discover constraints that hold for smaller subsets of cases of a process.

  • Information system ecology: An application of dataphoric ascendancy
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-20
    Michael J. Pritchard; J.C. Martel

    Information systems, like biological systems, are susceptible to external perturbations. Similar to flora and fauna in a biome, species of data can be classified within a dataphora. While entropic properties and data geometries can be used to describe local species of data within a dataphora, they are not designed to describe the global properties of an information system or evaluate its stability. Ecologists have used Information Theories to describe macro-level properties of biological ecosystems and statistical tools to evaluate biological systems. This research leverages an ecological perspective to model information systems as a living system. Our findings support the theory of dataphoric ascendancy with Wikipedia having a Diversity Index value of 0.68, within the range of 0.65 and 0.80 that indicates a balanced state. We further support our findings with additional evaluations of other ecosystems including the predicted collapse of the information service known as the Digital Universe. This research allows for an information system’s stability to be (a) characterized and (b) predicted using ecological measures specific to the diversity of data within the ecosystem.

  • Compatible byte-addressable direct I/O for peripheral memory devices in Linux
    Inform. Syst. (IF 2.066) Pub Date : 2020-01-02
    Sung Hoon Baek; Ki-Woong Park

    Memory devices can be used as storage systems to provide a lower latency that can be achieved by disk and flash storage. However, traditional buffered input/output (I/O) and direct I/O are not optimized for memory-based storages. Traditional buffered I/O includes a redundant memory copy with a disk cache. Traditional direct I/O does not support byte addressing. Memory-mapped direct I/O, which optimizes file operations for byte-addressable persistent memory and appears to the CPU as a main memory. However, it has an interface that is not always compatible with existing applications. In addition, it cannot be used for peripheral memory devices (e.g., networked memory devices and hardware RAM drives) that are not interfaced with the memory bus. This paper presents a new Linux I/O layer, byte direct I/O (BDIO), that can process byte-addressable direct I/O using the standard application programming interface. It requires no modification of existing application programs and can be used not only for the memory but also for the peripheral memory devices that are not addressable by a memory management unit. The proposed BDIO layer allows file systems and device drivers to easily support BDIO. The new I/O achieved 18% to 102% performance improvements in the evaluation experiments conducted with online transaction processing, file server, and desktop virtualization storage.

  • An Alternative View on Data Processing Pipelines from the DOLAP 2019 Perspective
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-27
    Oscar Romero; Robert Wrembel; Il-Yeol Song

    Data science requires constructing data processing pipelines (DPPs), which span diverse phases such as data integration, cleaning, pre-processing, and analysis. However, current solutions lack a strong data engineering perspective. As consequence, DPPs are error-prone, inefficient w.r.t. human efforts, and inefficient w.r.t. execution time. We claim that DPP design, development, testing, deployment, and execution should benefit from a standardized DPP architecture and from well-known data engineering solutions. This claim is supported by our experience in real projects and trends in the field, and it opens new paths for research and technology. With this spirit, we outline five research opportunities that represent novel trends towards building DPPs. Finally, we highlight that the best DOLAP 2019 papers selected for the DOLAP 2019 Information Systems Special Issue fall in this category and highlight the relevance of advanced data engineering for data science.

  • Detecting coherent explorations in SQL workloads
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-09
    Verónika Peralta; Patrick Marcel; Willeme Verdeaux; Aboubakar Sidikhy Diakhaby

    This paper presents a proposal aiming at better understanding a workload of SQL queries and detecting coherent explorations hidden within the workload. In particular, our work investigates SQLShare (Jain et al., 2016), a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of Jain et al. (2016), this workload is the only one containing primarily ad-hoc hand-written queries over user-uploaded datasets. We analyzed this workload by extracting features that characterize SQL queries and we investigate three different machine learning approaches to use these features to separate sequences of SQL queries into meaningful explorations. The first approach is unsupervised and based only on similarity between contiguous queries. The second approach uses transfer learning to apply a model trained over a dataset where ground truth is available. The last approach uses weak labelling to predict the most probable segmentation from heuristics meant to label a training set. We ran several tests over various query workloads to evaluate and compare the proposed methods.

  • Two-stage optimization for machine learning workflow
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-09
    Alexandre Quemy

    Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as AutoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.

  • Feedback driven improvement of data preparation pipelines
    Inform. Syst. (IF 2.066) Pub Date : 2019-12-06
    Nikolaos Konstantinou; Norman W. Paton

    Data preparation, whether for populating enterprise data warehouses or as a precursor to more exploratory analyses, is recognised as being laborious, and as a result is a barrier to cost-effective data analysis. Several steps that recur within data preparation pipelines are amenable to automation, but it seems important that automated decisions can be refined in the light of user feedback on data products. There has been significant work on how individual data preparation steps can be refined in the light of feedback. This paper goes further, by proposing an approach in which feedback on the correctness of values in a data product can be used to revise the results of diverse data preparation components. The approach uses statistical techniques, both in determining which actions should be applied to refine the data preparation process and to identify the values on which it would be most useful to obtain further feedback. The approach has been implemented to refine the results of matching, mapping and data repair components in the VADA data preparation system, and is evaluated using deep web and open government data sets from the real estate domain. The experiments have shown how the approach enables feedback to be assimilated effectively for use with individual data preparation components, and furthermore that synergies result from applying the feedback to several data preparation components.

  • CoPModL: Construction Process Modeling Language and Satisfiability Checking
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-27
    Elisa Marengo; Werner Nutt; Matthias Perktold

    Process modeling has been widely investigated in the literature and several general purpose approaches have been introduced, addressing a variety of domains. However, generality goes to the detriment of the possibility to model details and peculiarities of a particular application domain. As acknowledged by the literature, known approaches predominantly focus on one aspect between control flow and data, thus neglecting the interplay between the two. Moreover, process instances are not considered or considered in isolation, neglecting, among other aspects, synchronization points among them. As a consequence, the model is an approximation of the real process, limiting its reliability and usefulness in particular domains. This observation emerged clearly in the context of a research project in the construction domain, where preliminary attempts to model inter-company processes show the lack of an appropriate language. Building on a semi-formal language tested on real construction projects, in this paper we define CoPModL, a process modeling language which accounts both for activities and items on which activities are to be executed. The language supports the specification of different item-based dependencies among the activities, thus serving as a synchronization specification among several activity instances. We provide a formal semantics for the language in terms of LTL over finite traces. This paves the way for the development of automatic reasoning. In this respect, we investigate process model satisfiability and develop an effective algorithm to check it.

  • Design principles for the General Data Protection Regulation (GDPR): A formal concept analysis and its evaluation
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-20
    Damian A. Tamburri

    Data and software are nowadays one and the same: for this very reason, the European Union (EU) and other governments introduce frameworks for data protection — a key example being the General Data Protection Regulation (GDPR). However, GDPR compliance is not straightforward: its text is not written by software or information engineers but rather, by lawyers and policy-makers. As a design aid to information engineers aiming for GDPR compliance, as well as an aid to software users’ understanding of the regulation, this article offers a systematic synthesis and discussion of it, distilled by the mathematical analysis method known as Formal Concept Analysis (FCA). By its principles, GDPR is synthesized as a concept lattice, that is, a formal summary of the regulation, featuring 144372 records — its uses are manifold. For example, the lattice captures so-called attribute implications, the implicit logical relations across the regulation, and their intensity. These results can be used as drivers during systems and services (re-)design, development, operation, or information systems’ refactoring towards more GDPR consistency.

  • Formalising and animating multiple instances in BPMN collaborations
    Inform. Syst. (IF 2.066) Pub Date : 2019-11-01
    Flavio Corradini; Chiara Muzi; Barbara Re; Lorenzo Rossi; Francesco Tiezzi

    The increasing adoption of modelling methods contributes to a better understanding of the flow of processes, from the internal behaviour of a single organisation to a wider perspective where several organisations exchange messages. In this regard, BPMN collaborations provide a suitable modelling abstraction. Even if this is a widely accepted notation, only a limited effort has been expended in formalising its semantics, especially for what it concerns the interplay among control features, data handling and exchange of messages in scenarios requiring multiple instances of interacting participants. In this paper, we face the problem of providing a formal semantics for BPMN collaborations including elements dealing with multiple instances, i.e., multi-instance pools and sequential/parallel multi-instance tasks. For an accurate account of these features, it is necessary to consider the data perspective of collaboration models, thus supporting data objects, data collections and data stores, and different execution modalities of tasks concerning atomicity and concurrency. Beyond defining a novel formalisation, we also provide a BPMN collaboration animator tool, named MIDA, faithfully implementing the formal semantics. MIDA can also support designers in debugging multi-instance collaboration models.

  • A DSL for WSN software components coordination
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-31
    Marcos Aurélio Carrero; Martin A. Musicante; Aldri Luiz dos Santos; Carmem S. Hara

    Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually composed of a large number of devices. Developing systems for such networks is a hard task and often involves validation on simulation environments before deployment on real settings. Component-based development allows systems to be built from reusable, existing components that share a common interface. This paper proposes a domain specific language (DSL) for coordination of WSN software components. The language provides high-level composition primitives to promote a flexible coordination execution flow and interaction between them. We present the language specification as well as a case study of an in-network WSN data storage coordination. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible development model. Moreover, we analyze the code reusability promoted by the language and show that it reduces the programming effort in a component-based development framework.

  • Aligning observed and modelled behaviour by maximizing synchronous moves and using milestones
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-26
    Vincent Bloemen; Sebastiaan van Zelst; Wil van der Aalst; Boudewijn van Dongen; Jaco van de Pol

    Given a process model and an event log, conformance checking aims to relate the two together, e.g. to detect discrepancies between them. For the synchronous product net of the process and a log trace, we can assign different costs to a synchronous move, and a move in the log or model. By computing a path through this (synchronous) product net, whilst minimizing the total cost, we create a so-called optimal alignment – which is considered to be the primary target result for conformance checking. Traditional alignment-based approaches (1) have performance problems for larger logs and models, and (2) do not provide reliable diagnostics for non-conforming behaviour (e.g. bottleneck analysis is based on events that did not happen). This is the reason to explore an alternative approach that maximizes the use of observed events. We also introduce the notion of milestone activities, i.e. unskippable activities, and show how the different approaches relate to each other. We propose a data structure, that can be computed from the process model, which can be used for (1) computing alignments of many log traces that maximize synchronous moves, and (2) as a means for analysing non-conforming behaviour. In our experiments we show the differences of various alignment cost functions. We also show how the performance of constructing alignments with our data structure relates to that of the state-of-the-art techniques.

  • BINet: Multi-perspective business process anomaly classification
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-26
    Timo Nolle; Stefan Luettgen; Alexander Seeliger; Max Mühlhäuser

    In this paper, we introduce BINet, a neural network architecture for real-time multi-perspective anomaly detection in business process event logs. BINet is designed to handle both the control flow and the data perspective of a business process. Additionally, we propose a set of heuristics for setting the threshold of an anomaly detection algorithm automatically. We demonstrate that BINet can be used to detect anomalies in event logs not only at a case level but also at event attribute level. Finally, we demonstrate that a simple set of rules can be used to utilize the output of BINet for anomaly classification. We compare BINet to eight other state-of-the-art anomaly detection algorithms and evaluate their performance on an elaborate data corpus of 29 synthetic and 15 real-life event logs. BINet outperforms all other methods both on the synthetic as well as on the real-life datasets.

  • Detecting trend deviations with generic stream processing patterns
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-22
    Massiva Roudjane; Djamal Rebaïne; Raphaël Khoury; Sylvain Hallé

    Information systems produce different types of event logs; in many situations, it may be desirable to look for trends inside these logs. We show how trends of various kinds can be computed over such logs in real time, using a generic framework called the trend distance workflow. Many common computations on event streams turn out to be special cases of this workflow, depending on how a handful of workflow parameters are defined. This process has been implemented and tested in a real-world event stream processing tool, called BeepBeep. Experimental results show that deviations from a reference trend can be detected in realtime for streams producing up to thousands of events per second.

  • Service contract modeling in Enterprise Architecture: An ontology-based approach
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-18
    Cristine Griffo; João Paulo A. Almeida; Giancarlo Guizzardi; Julio Cesar Nardi

    Service contracts bind parties legally, regulating their behavior in the scope of a (business) service relationship. Given that there are legal consequences attached to service contracts, understanding the elements of a contract is key to managing services in an enterprise. After all, provisions in a service contract and in legislation establish obligations and rights for service providers and customers that must be respected in service delivery. The importance of service contracts to service provisioning in an enterprise has motivated us to investigate their representation in enterprise models. We have observed that approaches fall into two extremes of a spectrum. Some approaches, such as ArchiMate, offer an opaque “contract” construct, not revealing the rights and obligations in the scope of the governed service relationship. Other approaches, under the umbrella term “contract languages”, are devoted exactly to the formal representation of the contents of contracts. Despite the applications of contract languages, they operate at a level of detail that does not match that of enterprise architecture models. In this paper, we explore and bridge the gap between these two extremes. We address the representation of service contract elements with a systematic approach: we first propose a well-founded service contract ontology, and then extend the ArchiMate language to reflect the elements of the service contract ontology. The applicability of the proposed extension is assessed in the representation of a real-world cloud service contract.

  • Enabling runtime flexibility in data-centric and data-driven process execution engines
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-17
    Kevin Andrews; Sebastian Steinau; Manfred Reichert

    Contemporary process management systems support users during the execution of predefined business processes. However, when unforeseen situations occur, which are not part of the process model serving as the template for process execution, contemporary technology is often unable to offer adequate user support. One solution to this problem is to allow for ad-hoc changes to process models, i.e., changes that may be applied on the fly to a running process instance. As opposed to the widespread activity-centric process modeling paradigm, for which the support of instance-specific ad-hoc changes is well researched, albeit not properly supported by most commercial process engines, there is no corresponding support for ad-hoc changes in other process support paradigms, such as artifact-centric or object-aware process management. This article presents concepts for supporting ad-hoc changes in data-centric and data-driven processes, and gives insights into the challenges to be tackled when implementing this kind of process flexibility in the PHILharmonicFlows process execution engine. We evaluated the concepts by implementing a proof-of-concept prototype and applying it to various scenarios. The development of advanced flexibility features is highly relevant for data-centric processes, as the research field is generally perceived as having low maturity compared to activity-centric processes.

  • Characterizing client usage patterns and service demand for car-sharing systems
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-11
    Victor A. Alencar; Felipe Rooke; Michele Cocca; Luca Vassio; Jussara Almeida; Alex Borges Vieira

    The understanding of the mobility on urban spaces is useful for the creation of smarter and sustainable cities. However, getting data about urban mobility is challenging, since only a few companies have access to accurate and updated data, that is also privacy-sensitive. In this work, we characterize three distinct car-sharing systems which operate in Vancouver (Canada) and nearby regions, gathering data for more than one year. Our study uncovers patterns of users’ habits and demands for these services. We highlight the common characteristics and the main differences among car-sharing systems. Finally, we believe our study and data is useful for generating realistic synthetic workloads.

  • How meaningful are similarities in deep trajectory representations?
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-11
    Saeed Taghizadeh; Abel Elekes; Martin Schäler; Klemens Böhm

    Finding similar trajectories is an important task in moving object databases. However, classical similarity models face several limitations, including scalability and robustness. Recently, an approach named t2vec proposed transforming trajectories into points in a high dimensional vector space, and this transformation approximately keeps distances between trajectories. t2vec overcomes that scalability limitation: Now it is possible to cluster millions of trajectories. However, the semantics of the learned similarity values – and whether they are meaningful – is an open issue. One can ask: How does the configuration of t2vec affect the similarity values of trajectories? Is the notion of similarity in t2vec similar, different, or even superior to existing models? As for any neural-network-based approach, inspecting the network does not help to answer these questions. So the problem we address in this paper is how to assess the meaningfulness of similarity in deep trajectory representations. Our solution is a methodology based on a set of well-defined, systematic experiments. We compare t2vec to classical models in terms of robustness and their semantics of similarity, using two real-world datasets. We give recommendations which model to use in possible application scenarios and use cases. We conclude that using t2vec in combination with classical models may be the best way to identify similar trajectories. Finally, to foster scientific advancement, we give the public access to all trained t2vec models and experiment scripts. To our knowledge, this is the biggest collection of its kind.

  • Speed prediction in large and dynamic traffic sensor networks
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-11
    Regis Pires Magalhaes; Francesco Lettich; Jose Antonio Macedo; Franco Maria Nardini; Raffaele Perego; Chiara Renso; Roberto Trani

    Smart cities are nowadays equipped with pervasive networks of sensors that monitor traffic in real-time and record huge volumes of traffic data. These datasets constitute a rich source of information that can be used to extract knowledge useful for municipalities and citizens. In this paper we are interested in exploiting such data to estimate future speed in traffic sensor networks, as accurate predictions have the potential to enhance decision making capabilities of traffic management systems. Building effective speed prediction models in large cities poses important challenges that stem from the complexity of traffic patterns, the number of traffic sensors typically deployed, and the evolving nature of sensor networks. Indeed, sensors are frequently added to monitor new road segments or replaced/removed due to different reasons (e.g., maintenance). Exploiting a large number of sensors for effective speed prediction thus requires smart solutions to collect vast volumes of data and train effective prediction models. Furthermore, the dynamic nature of real-world sensor networks calls for solutions that are resilient not only to changes in traffic behavior, but also to changes in the network structure, where the cold start problem represents an important challenge. We study three different approaches in the context of large and dynamic sensor networks: local, global, and cluster-based. The local approach builds a specific prediction model for each sensor of the network. Conversely, the global approach builds a single prediction model for the whole sensor network. Finally, the cluster-based approach groups sensors into homogeneous clusters and generates a model for each cluster. We provide a large dataset, generated from ∼1.3 billion records collected by up to 272 sensors deployed in Fortaleza, Brazil, and use it to experimentally assess the effectiveness and resilience of prediction models built according to the three aforementioned approaches. The results show that the global and cluster-based approaches provide very accurate prediction models that prove to be robust to changes in traffic behavior and in the structure of sensor networks.

  • Detection and removal of infrequent behavior from event streams of business processes
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-09
    Sebastiaan J. van Zelst; Mohammadreza Fani Sani; Alireza Ostovar; Raffaele Conforti; Marcello La Rosa

    Process mining aims at gaining insights into business processes by analyzing the event data that is generated and recorded during process execution. The vast majority of existing process mining techniques works offline, i.e. using static, historical data, stored in event logs. Recently, the notion of online process mining has emerged, in which techniques are applied on live event streams, i.e. as the process executions unfold. Analyzing event streams allows us to gain instant insights into business processes. However, most online process mining techniques assume the input stream to be completely free of noise and other anomalous behavior. Hence, applying these techniques to real data leads to results of inferior quality. In this paper, we propose an event processor that enables us to filter out infrequent behavior from live event streams. Our experiments show that we are able to effectively filter out events from the input stream and, as such, improve online process mining results.

  • Toward higher-level abstractions based on state machine for cloud resources elasticity
    Inform. Syst. (IF 2.066) Pub Date : 2019-10-09
    Hayet Brabra; Achraf Mtibaa; Walid Gaaloul; Boualem Benatallah

    With the dynamic nature of cloud applications and rapid change of their resource requirements, elasticity over cloud resources has to be effectively supported. It represents the ability to dynamically adjust cloud resources that applications use in order to adapt to their varying workloads, while maintaining the desired quality of service. However, implementing elasticity is still challenging task for cloud users as heterogeneous and low-level interfaces are provided to manage cloud resources. To alleviate this, we believe that elasticity features should be provided at resource description level. In this paper, we propose a new Cloud Resource Description Model called cRDM, which is based on State Machine formalism. Using this model, we aim at representing cloud resources while considering their elasticity behavior over the time without referring to any low level interfaces or cloud provider technical constraints. We also propose a software system based on this new specification to support the elasticity-aware orchestration of cloud resources by exploiting the underlying cloud orchestration tools and APIs. We rely on a real use case to demonstrate the applicability of the proposed system and conduct a set of experiments proving the productivity and expressiveness of the cRDM model in comparison to existing solutions. The resulted findings of our evaluation shows the efficiency of our proposal.

  • Shared Ledger Accounting — Implementing the Economic Exchange pattern
    Inform. Syst. (IF 2.066) Pub Date : 2019-09-18
    Hans Weigand; Ivars Blums; Joost de Kruijff

    Distributed Ledger Technology (DLT) suggests a new way to implement Accounting Information Systems, but an ontologically sound consensus-based design is missing to date. Against this research gap, the paper introduces a DLT-based shared ledger solution in a formal way and compliant with Financial Reporting Standards. We build on the COFRIS accounting ontology (grounded on UFO) and the blockchain ontology developed by De Kruijff & Weigand that distinguishes between a Datalogical level, an Infological and an Essential (conceptual) level. It is shown how both consensual and enterprise-specific parts of the business exchange transaction can be represented in a concise way, and how this pattern can be implemented using Smart Contracts. It is argued that the proposed Shared Ledger Accounting system increases the quality of the contents from an accounting perspective as well as the quality of the system in terms of auditability and interoperability.

  • Mining association rules for anomaly detection in dynamic process runtime behavior and explaining the root cause to users
    Inform. Syst. (IF 2.066) Pub Date : 2019-09-18
    Kristof Böhmer; Stefanie Rinderle-Ma

    Detecting anomalies in process runtime behavior is crucial: they might reflect, on the one side, security breaches and fraudulent behavior and on the other side desired deviations due to, for example, exceptional conditions. Both scenarios yield valuable insights for process analysts and owners, but happen due to different reasons and require a different treatment. Hence a distinction into malign and benign anomalies is required. Existing anomaly detection approaches typically fall short in supporting experts when in need to take this decision. An additional problem are false positives which could result in selecting incorrect countermeasures. This paper proposes a novel anomaly detection approach based on association rule mining. It fosters the explanation of anomalies and the estimation of their severity. In addition, the approach is able to deal with process change and flexible executions which potentially lead to false positives. This facilitates to take the appropriate countermeasure for a malign anomaly and to avoid the possible termination of benign process executions. The feasibility and result quality of the approach are shown by a prototypical implementation and by analyzing real life logs with injected artificial anomalies. The explanatory power of the presented approach is evaluated through a controlled experiment with users.

  • A deep view-point language and framework for projective modeling
    Inform. Syst. (IF 2.066) Pub Date : 2019-09-18
    Colin Atkinson; Christian Tunjic

    Most view-based modeling approaches are today based on a “synthetic” approach in which the views hold all the information modeled about a system and are kept consistent using explicit, inter-view correspondence rules. The alternative “projective” approach, in which the contents of views are “projected” from a single underlying model on demand, is far less widely used due to the lack of suitable conceptual frameworks and languages. In this paper we take a step towards addressing this problem by presenting the foundations of a suitable language and conceptual framework for defining and applying views for projective modeling. The framework leverages deep modeling in order to seamlessly support views that exist at, and span, multiple levels of classification. The viewpoint language was developed in the context of Orthographic Software Modeling but is more generally applicable to any projective modeling approach.

  • Formal foundations for responsible application integration
    Inform. Syst. (IF 2.066) Pub Date : 2019-09-18
    Daniel Ritter; Stefanie Rinderle-Ma; Marco Montali; Andrey Rivkin

    Enterprise Application Integration (EAI) constitutes the cornerstone in enterprise IT landscapes that are characterized by heterogeneity and distribution. Starting from established Enterprise Integration Patterns (EIPs) such as Content-based Router and Aggregator, EIP compositions are built to describe, implement, and execute integration scenarios. The EIPs and their compositions must be correct at design and runtime in order to avoid functional errors or incomplete functionalities. However, current EAI system vendors use many of the EIPs as part of their proprietary integration scenario modeling languages that are not grounded on any formalism. This renders correctness guarantees for EIPs and their composition impossible. Thus this work advocates responsible EAI based on the formalization, implementation, and correctness of EIPs. For this, requirements on an EIP formalization are collected and based on these requirements an extension of db-net, i.e., timed db-net , is proposed, fully equipped with execution semantics. It is shown how EIPs can be realized based on timed db-nets and how the correctness of these realizations can be shown. Moreover, the simulation of EIP realizations based on timed db-nets is enabled which is essential for later implementation. The concepts are evaluated in many ways, including a proof-of-concept implementation and case studies. The EIP formalization based on timed db-nets constitutes the first step towards responsible EAI.

  • On modeling context-aware social collaboration processes.
    Inform. Syst. (IF 2.066) Pub Date : 2014-07-01
    Vitaliy Liptchinsky,Roman Khazankin,Stefan Schulte,Benjamin Satzger,Hong-Linh Truong,Schahram Dustdar

    Modeling collaboration processes is a challenging task. Existing modeling approaches are not capable of expressing the unpredictable, non-routine nature of human collaboration, which is influenced by the social context of involved collaborators. We propose a modeling approach which considers collaboration processes as the evolution of a network of collaborative documents along with a social network of collaborators. Our modeling approach, accompanied by a graphical notation and formalization, allows to capture the influence of complex social structures formed by collaborators, and therefore facilitates such activities as the discovery of socially coherent teams, social hubs, or unbiased experts. We demonstrate the applicability and expressiveness of our approach and notation, and discuss their strengths and weaknesses.

  • A Dimensionality Reduction Technique for Efficient Time Series Similarity Analysis.
    Inform. Syst. (IF 2.066) Pub Date : 2008-05-23
    Qiang Wang,Vasileios Megalooikonomou

    We propose a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. In contrast to piecewise constant approximation (PCA) techniques that approximate each time series with constant value segments, the proposed method--Piecewise Vector Quantized Approximation--uses the closest (based on a distance measure) codeword from a codebook of key-sequences to represent each segment. The new representation is symbolic and it allows for the application of text-based retrieval techniques into time series similarity analysis. Experiments on real and simulated datasets show that the proposed technique generally outperforms PCA techniques in clustering and similarity searches.

  • Compliance monitoring in business processes: Functionalities, application, and tool-support.
    Inform. Syst. (IF 2.066) Pub Date : 2015-12-05
    Linh Thao Ly,Fabrizio Maria Maggi,Marco Montali,Stefanie Rinderle-Ma,Wil M P van der Aalst

    In recent years, monitoring the compliance of business processes with relevant regulations, constraints, and rules during runtime has evolved as major concern in literature and practice. Monitoring not only refers to continuously observing possible compliance violations, but also includes the ability to provide fine-grained feedback and to predict possible compliance violations in the future. The body of literature on business process compliance is large and approaches specifically addressing process monitoring are hard to identify. Moreover, proper means for the systematic comparison of these approaches are missing. Hence, it is unclear which approaches are suitable for particular scenarios. The goal of this paper is to define a framework for Compliance Monitoring Functionalities (CMF) that enables the systematic comparison of existing and new approaches for monitoring compliance rules over business processes during runtime. To define the scope of the framework, at first, related areas are identified and discussed. The CMFs are harvested based on a systematic literature review and five selected case studies. The appropriateness of the selection of CMFs is demonstrated in two ways: (a) a systematic comparison with pattern-based compliance approaches and (b) a classification of existing compliance monitoring approaches using the CMFs. Moreover, the application of the CMFs is showcased using three existing tools that are applied to two realistic data sets. Overall, the CMF framework provides powerful means to position existing and future compliance monitoring approaches.

  • Dealing with change in process choreographies: Design and implementation of propagation algorithms.
    Inform. Syst. (IF 2.066) Pub Date : 2015-04-22
    Walid Fdhila,Conrad Indiono,Stefanie Rinderle-Ma,Manfred Reichert

    Enabling process changes constitutes a major challenge for any process-aware information system. This not only holds for processes running within a single enterprise, but also for collaborative scenarios involving distributed and autonomous partners. In particular, if one partner adapts its private process, the change might affect the processes of the other partners as well. Accordingly, it might have to be propagated to concerned partners in a transitive way. A fundamental challenge in this context is to find ways of propagating the changes in a decentralized manner. Existing approaches are limited with respect to the change operations considered as well as their dependency on a particular process specification language. This paper presents a generic change propagation approach that is based on the Refined Process Structure Tree, i.e., the approach is independent of a specific process specification language. Further, it considers a comprehensive set of change patterns. For all these change patterns, it is shown that the provided change propagation algorithms preserve consistency and compatibility of the process choreography. Finally, a proof-of-concept prototype of a change propagation framework for process choreographies is presented. Overall, comprehensive change support in process choreographies will foster the implementation and operational support of agile collaborative process scenarios.

Contents have been reproduced by permission of the publishers.
上海纽约大学William Glover