当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NetNCSP: Nonoverlapping closed sequential pattern mining.
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-03-31 , DOI: 10.1016/j.knosys.2020.105812
Youxi Wu 1, 2, 3 , Changrui Zhu 1 , Yan Li 4 , Lei Guo 2 , Xindong Wu 5, 6
Affiliation  

Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.

中文翻译:

NetNCSP:不重叠的封闭顺序模式挖掘。

顺序模式挖掘(SPM)已应用于许多领域。但是,传统的SPM忽略了顺序的图案重复。为了解决这个问题,提出了间隙约束SPM,可以避免发现太多无用的模式。作为间隔约束SPM的一个分支,非重叠SPM意味着任何两个出现都不能在与出现相同的位置使用相同的序列字母。不重叠的SPM可以在效率和完整性之间取得平衡。现有方法发现的频繁模式通常包含冗余模式。为了减少冗余模式并提高挖掘性能,本文采用闭合模式挖掘策略,提出了一种完整的算法,即基于Nettree结构的非重叠闭合顺序模式Nettree(NetNCSP)。NetNCSP配备了两个关键步骤,支持计算和确定亲密性。采用回溯策略来计算相应Nettree上模式的非重叠支持,从而降低了时间复杂度。本文还提出了三种修剪策略,即继承,预测和确定。这些修剪策略能够有效地找到冗余模式,因为这些策略可以在生成候选模式之前预测模式的频率和紧密度。实验结果表明,NetNCSP不仅效率更高,而且还可以发现更多具有良好可压缩性的闭合模式。此外,在生物学实验中,NetNCSP挖掘SARS-CoV-2和SARS病毒中的封闭模式。结果表明,两种病毒具有相似的模式组成和不同的组合。
更新日期:2020-03-31
down
wechat
bug