当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ProcK: Machine Learning for Knowledge-Intensive Processes
arXiv - CS - Databases Pub Date : 2021-09-10 , DOI: arxiv-2109.04881
Tobias Jacobs, Jingyi Yu, Julia Gastinger, Timo Sztyler

Process mining deals with extraction of knowledge from business process execution logs. Traditional process mining tasks, like process model generation or conformance checking, rely on a minimalistic feature set where each event is characterized only by its case identifier, activity type, and timestamp. In contrast, the success of modern machine learning is based on models that take any available data as direct input and build layers of features automatically during training. In this work, we introduce ProcK (Process & Knowledge), a novel pipeline to build business process prediction models that take into account both sequential data in the form of event logs and rich semantic information represented in a graph-structured knowledge base. The hybrid approach enables ProcK to flexibly make use of all information residing in the databases of organizations. Components to extract inter-linked event logs and knowledge bases from relational databases are part of the pipeline. We demonstrate the power of ProcK by training it for prediction tasks on the OULAD e-learning dataset, where we achieve state-of-the-art performance on the tasks of predicting student dropout from courses and predicting their success. We also apply our method on a number of additional machine learning tasks, including exam score prediction and early predictions that only take into account data recorded during the first weeks of the courses.

中文翻译:

ProcK:知识密集型过程的机器学习

流程挖掘处理从业务流程执行日志中提取知识。传统的流程​​挖掘任务,如流程模型生成或一致性检查,依赖于一个极简的功能集,其中每个事件的特征仅在于其案例标识符、活动类型和时间戳。相比之下,现代机器学习的成功基于将任何可用数据作为直接输入并在训练期间自动构建特征层的模型。在这项工作中,我们引入了 ProcK(流程和知识),这是一种构建业务流程预测模型的新管道,该模型同时考虑了事件日志形式的顺序数据和图结构知识库中表示的丰富语义信息。混合方法使 ProcK 能够灵活地利用驻留在组​​织数据库中的所有信息。从关系数据库中提取相互关联的事件日志和知识库的组件是管道的一部分。我们通过在 OULAD 电子学习数据集上训练 ProcK 进行预测任务来展示 ProcK 的强大功能,我们在预测学生辍学和预测他们成功的任务上取得了最先进的性能。我们还将我们的方法应用于许多其他机器学习任务,包括考试分数预测和仅考虑课程前几周记录的数据的早期预测。我们通过在 OULAD 电子学习数据集上训练 ProcK 进行预测任务来展示 ProcK 的强大功能,我们在预测学生辍学和预测他们成功的任务上取得了最先进的性能。我们还将我们的方法应用于许多其他机器学习任务,包括考试分数预测和仅考虑课程前几周记录的数据的早期预测。我们通过在 OULAD 电子学习数据集上训练 ProcK 进行预测任务来展示 ProcK 的强大功能,我们在预测学生辍学和预测他们成功的任务上取得了最先进的性能。我们还将我们的方法应用于许多其他机器学习任务,包括考试分数预测和仅考虑课程前几周记录的数据的早期预测。
更新日期:2021-09-13
down
wechat
bug