当前位置: X-MOL 学术Data Technol. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed elephant herding optimization for grid-based privacy association rule mining
Data Technologies and Applications ( IF 1.7 ) Pub Date : 2020-05-15 , DOI: 10.1108/dta-07-2019-0104
Praveen Kumar Gopagoni , Mohan Rao S K

Purpose

Association rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation of the rules is quite high. On the other hand, the candidate rules generated using the traditional association rules mining face a huge challenge in terms of time and space, and the process is lengthy. In order to tackle the issues of the existing methods and to render the privacy rules, the paper proposes the grid-based privacy association rule mining.

Design/methodology/approach

The primary intention of the research is to design and develop a distributed elephant herding optimization (EHO) for grid-based privacy association rule mining from the database. The proposed method of rule generation is processed as two steps: in the first step, the rules are generated using apriori algorithm, which is the effective association rule mining algorithm. In general, the extraction of the association rules from the input database is based on confidence and support that is replaced with new terms, such as probability-based confidence and holo-entropy. Thus, in the proposed model, the extraction of the association rules is based on probability-based confidence and holo-entropy. In the second step, the generated rules are given to the grid-based privacy rule mining, which produces privacy-dependent rules based on a novel optimization algorithm and grid-based fitness. The novel optimization algorithm is developed by integrating the distributed concept in EHO algorithm.

Findings

The experimentation of the method using the databases taken from the Frequent Itemset Mining Dataset Repository to prove the effectiveness of the distributed grid-based privacy association rule mining includes the retail, chess, T10I4D100K and T40I10D100K databases. The proposed method outperformed the existing methods through offering a higher degree of privacy and utility, and moreover, it is noted that the distributed nature of the association rule mining facilitates the parallel processing and generates the privacy rules without much computational burden. The rate of hiding capacity, the rate of information preservation and rate of the false rules generated for the proposed method are found to be 0.4468, 0.4488 and 0.0654, respectively, which is better compared with the existing rule mining methods.

Originality/value

Data mining is performed in a distributed manner through the grids that subdivide the input data, and the rules are framed using the apriori-based association mining, which is the modification of the standard apriori with the holo-entropy and probability-based confidence replacing the support and confidence in the standard apriori algorithm. The mined rules do not assure the privacy, and hence, the grid-based privacy rules are employed that utilize the adaptive elephant herding optimization (AEHO) for generating the privacy rules. The AEHO inherits the adaptive nature in the standard EHO, which renders the global optimal solution.



中文翻译:

基于网格的隐私关联规则挖掘的分布式象群优化

目的

关联规则挖掘从数据库生成模式和相关性,这需要大量的扫描时间,并且与规则生成相关的计算成本非常高。另一方面,使用传统关联规则挖掘生成的候选规则在时间和空间方面面临巨大挑战,并且过程很漫长。为了解决现有方法的问题并提出隐私规则,本文提出了基于网格的隐私关联规则挖掘方法。

设计/方法/方法

该研究的主要目的是为数据库中基于网格的隐私关联规则挖掘设计和开发分布式大象放牧优化(EHO)。提出的规则生成方法分为两个步骤:第一步,使用先验算法生成规则,这是有效的关联规则挖掘算法。通常,从输入数据库中提取关联规则是基于置信度和支持的,并用新的术语代替,例如基于概率的置信度和全熵。因此,在所提出的模型中,关联规则的提取基于基于概率的置信度和全熵。在第二步中,将生成的规则提供给基于网格的隐私规则挖掘,它基于一种新颖的优化算法和基于网格的适应度来生成与隐私相关的规则。通过将分布式概念集成到EHO算法中,开发了新颖的优化算法。

发现

使用从“频繁项集”挖掘数据集存储库中获取的数据库进行的方法实验证明了基于分布式网格的隐私关联规则挖掘的有效性,包括零售数据库,国际象棋数据库,T10I4D100K和T40I10D100K数据库。所提出的方法通过提供更高程度的隐私和实用性而优于现有方法,此外,应注意的是,关联规则挖掘的分布式性质促进了并行处理并生成了隐私规则,而没有太多的计算负担。该方法产生的隐藏容量率,信息保存率和虚假规则率分别为0.4468、0.4488和0.0654,与现有的规则挖掘方法相比更好。

创意/价值

通过细分输入数据的网格以分布式方式执行数据挖掘,并使用基于先验的关联挖掘对规则进行框架化,这是对标准先验的修改,其中使用了全熵和基于概率的置信度来代替对标准先验算法的支持和信心。挖掘的规则不能确保隐私,因此,采用了基于网格的隐私规则,该规则利用自适应大象放牧优化(AEHO)生成隐私规则。AEHO继承了标准EHO中的自适应特性,从而提供了全局最优解。

更新日期:2020-07-20
down
wechat
bug