当前位置: X-MOL 学术Data Technol. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A regression-based algorithm for frequent itemsets mining
Data Technologies and Applications ( IF 1.7 ) Pub Date : 2019-09-05 , DOI: 10.1108/dta-03-2019-0037
Zirui Jia , Zengli Wang

Purpose

Frequent itemset mining (FIM) is a basic topic in data mining. Most FIM methods build itemset database containing all possible itemsets, and use predefined thresholds to determine whether an itemset is frequent. However, the algorithm has some deficiencies. It is more fit for discrete data rather than ordinal/continuous data, which may result in computational redundancy, and some of the results are difficult to be interpreted. The purpose of this paper is to shed light on this gap by proposing a new data mining method.

Design/methodology/approach

Regression pattern (RP) model will be introduced, in which the regression model and FIM method will be combined to solve the existing problems. Using a survey data of computer technology and software professional qualification examination, the multiple linear regression model is selected to mine associations between items.

Findings

Some interesting associations mined by the proposed algorithm and the results show that the proposed method can be applied in ordinal/continuous data mining area. The experiment of RP model shows that, compared to FIM, the computational redundancy decreased and the results contain more information.

Research limitations/implications

The proposed algorithm is designed for ordinal/continuous data and is expected to provide inspiration for data stream mining and unstructured data mining.

Practical implications

Compared to FIM, which mines associations between discrete items, RP model could mine associations between ordinal/continuous data sets. Importantly, RP model performs well in saving computational resource and mining meaningful associations.

Originality/value

The proposed algorithms provide a novelty view to define and mine association.



中文翻译:

基于回归的频繁项集挖掘算法

目的

频繁项集挖掘(FIM)是数据挖掘中的基本主题。大多数FIM方法都会建立包含所有可能项目集的项目集数据库,并使用预定义的阈值来确定项目集是否频繁。但是,该算法有一些缺陷。它比离散/连续数据更适合离散数据,这可能导致计算冗余,并且其中一些结果难以解释。本文的目的是通过提出一种新的数据挖掘方法来阐明这一差距。

设计/方法/方法

将介绍回归模式(RP)模型,其中将回归模型和FIM方法结合起来以解决现有问题。使用计算机技术和软件专业资格考试的调查数据,选择多元线性回归模型来挖掘项目之间的关联。

发现

该算法挖掘出一些有趣的关联,结果表明该方法可以在有序/连续数据挖掘领域中应用。RP模型的实验表明,与FIM相比,计算冗余减少,结果包含更多信息。

研究局限/意义

该算法是针对有序/连续数据设计的,有望为数据流挖掘和非结构化数据挖掘提供启发。

实际影响

与挖掘离散项之间关联的FIM相比,RP模型可以挖掘顺序/连续数据集之间的关联。重要的是,RP模型在节省计算资源和挖掘有意义的关联方面表现良好。

创意/价值

所提出的算法为定义和挖掘关联提供了新颖的观点。

更新日期:2019-09-05
down
wechat
bug