当前位置: X-MOL 学术Softw. Syst. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An interdisciplinary comparison of sequence modeling methods for next-element prediction
Software and Systems Modeling ( IF 2 ) Pub Date : 2020-04-07 , DOI: 10.1007/s10270-020-00789-3
Niek Tax , Irene Teinemaa , Sebastiaan J. van Zelst

Data of sequential nature arise in many application domains in the form of, e.g., textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) In the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide range of tasks, (ii) in process mining process discovery methods aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal: learning a model that accurately captures the sequential behavior in the underlying data. Those sequence models are generative, i.e., they are able to predict what elements are likely to occur after a given incomplete sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.



中文翻译:

下一个元素预测的序列建模方法的跨学科比较

顺序性质的数据以文本数据,DNA序列和软件执行轨迹的形式出现在许多应用领域中。不同的研究学科已经开发了从此类数据集中学习序列模型的方法:(i)在机器学习领域,诸如(隐藏)马尔可夫模型和递归神经网络等方法已经得到开发,并成功地应用于各种任务;(ii)在过程挖掘中,过程发现方法旨在生成人类可解释的描述模型,并且(iii)在语法推断中领域的重点是寻找形式语法形式的描述性模型。尽管它们有不同的重点,但这些领域有一个共同的目标:学习一个模型,该模型可以准确地捕获基础数据中的顺序行为。这些序列模型是可生成的,即,它们能够预测在给定的不完整序列之后可能发生的元素。到目前为止,这些领域的发展主要是彼此孤立的,没有可比性。本文提出了一个跨学科的实验评估,该评估比较了序列建模方法在下一元素预测任务上的作用在四个真实的序列数据集上。结果表明,机器学习方法通​​常不以模型的可解释性为目标,在准确性方面往往优于过程挖掘和语法推理领域的方法。

更新日期:2020-04-22
down
wechat
bug