A Robust and Versatile Multi-View Learning Framework for the Detection of Deviant Business Process Instances,International Journal of Cooperative Information Systems

当前位置： X-MOL 学术 › Int. J. Coop. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Robust and Versatile Multi-View Learning Framework for the Detection of Deviant Business Process Instances
International Journal of Cooperative Information Systems ( IF 0.5 ) Pub Date : 2017-01-20 , DOI: 10.1142/s0218843017400032
Alfredo Cuzzocrea ₁ , Francesco Folino ₂ , Massimo Guarascio ₂ , Luigi Pontieri ₂

Affiliation

Increasing attention has been paid to the detection and analysis of “deviant” instances of a business process that are connected with some kind of “hidden” undesired behavior (e.g. frauds and faults). In particular, several recent works faced the problem of inducing a binary classification model (here named deviance detection model ) that can discriminate between deviant traces and normal ones, based on a set of historical log traces (labeled as either deviant or normal). Current solutions rely on applying standard classifier-induction methods to a feature-based representation of the given traces, where the features include sequence-based patterns extracted from the corresponding sequences of activities. However, there is no consensus on which kinds of patterns are the most suitable for such a task. On the other hand, mixing multiple pattern families together may produce a heterogenous, redundant and sparse representation of the traces that likely leads to poor deviance detection models. In this paper, we propose an ensemble-learning method for solving this problem, where multiple base classifiers are trained on different feature-based views of the log (each obtained by mapping the traces onto a distinguished collection of patterns). A stacking procedure is used to combine the discovered base models into an overall probabilistic model that associates any new trace with an estimate of the probability that it reflects a deviant process instance. This helps the analyst prioritize the inspection of the cases that are more likely to be deviant. The method also takes advantage of all nonstructural data available in the log, and employs a resampling mechanism to deal with the rarity of deviances in the training log. It has been conceived as the core of a comprehensive framework for detecting and analyzing business process deviances. The framework supports the analyst to investigate suspect deviances, and provides some feedback to the learning method for improving the accuracy of the discovered deviance detection models. Tests on several real-life datasets proved the validity of the approach, as concerns its capability to discover an accurate deviance detection model, and to effectively exploit new (originally unlabeled) traces via active learning and self-training mechanisms.

中文翻译：

用于检测异常业务流程实例的强大且多功能的多视图学习框架

对与某种“隐藏的”不良行为（例如欺诈和错误）相关的业务流程的“异常”实例的检测和分析已受到越来越多的关注。特别是，最近的几项工作面临的问题是，基于一组历史日志跟踪（标记为异常或正常），引入一个可以区分异常跟踪和正常跟踪的二元分类模型（这里称为异常检测模型）。当前的解决方案依赖于将标准分类器归纳方法应用于给定轨迹的基于特征的表示，其中特征包括从相应的活动序列中提取的基于序列的模式。但是，对于哪种模式最适合这样的任务，还没有达成共识。另一方面，将多个模式族混合在一起可能会产生异构的、冗余的和稀疏的轨迹表示，这可能会导致不良的偏差检测模型。在本文中，我们提出了一种集成学习方法来解决这个问题，其中多个基础分类器在日志的不同基于特征的视图上进行训练（每个基础分类器都是通过将轨迹映射到不同的模式集合上获得的）。堆叠过程用于将发现的基本模型组合成一个整体概率模型，该模型将任何新轨迹与它反映异常过程实例的概率估计相关联。这有助于分析人员优先检查更有可能出现偏差的案例。该方法还利用了日志中所有可用的非结构数据，并采用重采样机制来处理训练日志中罕见的偏差。它被认为是检测和分析业务流程偏差的综合框架的核心。该框架支持分析人员调查可疑偏差，并为学习方法提供一些反馈，以提高发现的偏差检测模型的准确性。对几个真实数据集的测试证明了该方法的有效性，因为它能够发现准确的偏差检测模型，并通过主动学习和自我训练机制有效地利用新的（最初未标记的）轨迹。该框架支持分析人员调查可疑偏差，并为学习方法提供一些反馈，以提高发现的偏差检测模型的准确性。对几个真实数据集的测试证明了该方法的有效性，因为它能够发现准确的偏差检测模型，并通过主动学习和自我训练机制有效地利用新的（最初未标记的）轨迹。该框架支持分析人员调查可疑偏差，并为学习方法提供一些反馈，以提高发现的偏差检测模型的准确性。对几个真实数据集的测试证明了该方法的有效性，因为它能够发现准确的偏差检测模型，并通过主动学习和自我训练机制有效地利用新的（最初未标记的）轨迹。

更新日期：2017-01-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11