当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2020-08-05 , DOI: 10.1145/3399671
Tingting Wang 1 , Lei Duan 1 , Guozhu Dong 2 , Zhifeng Bao 3
Affiliation  

Recently, a lot of research work has been proposed in different domains to detect outliers and analyze the outlierness of outliers for relational data. However, while sequence data is ubiquitous in real life, analyzing the outlierness for sequence data has not received enough attention. In this article, we study the problem of mining outlying sequence patterns in sequence data addressing the question: given a query sequence s in a sequence dataset D , the objective is to discover sequence patterns that will indicate the most unusualness (i.e., outlierness) of s compared against other sequences. Technically, we use the rank defined by the average probabilistic strength ( aps ) of a sequence pattern in a sequence to measure the outlierness of the sequence. Then a minimal sequence pattern where the query sequence is ranked the highest is defined as an outlying sequence pattern. To address the above problem, we present OSPMiner, a heuristic method that computes aps by incorporating several pruning techniques. Our empirical study using both real and synthetic data demonstrates that OSPMiner is effective and efficient.

中文翻译:

用于分析序列数据异常值的异常序列模式的高效挖掘

最近,已经在不同领域提出了许多研究工作来检测异常值并分析关系数据的异常值。然而,虽然序列数据在现实生活中无处不在,但分析序列数据的异常值并没有得到足够的重视。在这篇文章中,我们研究的问题挖掘序列数据中的离群序列模式解决问题:给定一个查询序列s在序列数据集中D,目标是发现序列模式,这些模式将表明最不寻常(即异常值)s与其他序列进行比较。从技术上讲,我们使用由平均概率强度定义的等级(应用程序) 的序列中的序列模式来衡量序列的异常值。然后将查询序列排名最高的最小序列模式定义为离群序列模式。为了解决上述问题,我们提出了 OSPMiner,一种启发式方法,计算应用程序通过结合几种修剪技术。我们使用真实数据和合成数据的实证研究表明,OSPMiner 是有效且高效的。
更新日期:2020-08-05
down
wechat
bug